Blamphs.ai

Documentation

Last updated: March 9, 2026

1. Getting Started

Welcome to Blamphs.ai! This guide will walk you through setting up autonomous GPU monitoring for your AWS infrastructure.

1.1 Sign Up

  1. Visit blamphs.ai/signup and create an account using Google, GitHub, LinkedIn, or email/password
  2. Your free trial starts immediately — no credit card required
  3. You'll be redirected to the onboarding flow

1.2 Connect Your AWS Account

Blamphs.ai uses a read-only IAM role to monitor your GPU infrastructure. We never have write access to your AWS resources.

  1. Create an IAM Role: In the AWS Console, go to IAM → Roles → Create Role
  2. Select Trusted Entity: Choose "Another AWS account" and enter our account ID (provided in the onboarding flow)
  3. Attach Policies: Add these read-only policies:
    • ReadOnlyAccess (AWS managed policy)
    • Or create a custom policy with minimal permissions: ec2:Describe*, cloudwatch:GetMetricStatistics, logs:GetLogEvents
  4. Copy the Role ARN: Save the ARN (looks like arn:aws:iam::123456789012:role/BlamphsReadOnly)
  5. Enter ARN in Blamphs: Paste the Role ARN into the Settings page and click "Connect"
  6. Verify Connection: Blamphs will scan your infrastructure and display detected GPU clusters

1.3 Configure Your First Cluster

Once connected, Blamphs automatically discovers GPU instances across all regions:

  • Auto-discovery: We detect EC2 instances with GPU types (p3, p4, g4dn, g5, etc.)
  • Set Constraints: Define monitoring rules in natural language (e.g., "Never exceed $50k/month" or "Keep utilization above 80%")
  • Enable Autonomous Actions: Choose which actions Blamphs can take automatically (scaling, node cordoning, rebooting)

1.4 Dashboard Overview

Your dashboard provides real-time insights:

  • Cluster Health: Live status of all GPU nodes
  • Utilization Metrics: GPU usage, memory, temperature, power draw
  • Cost Tracking: Current spend vs. budget, savings delivered
  • Event Log: All autonomous actions taken by Blamphs

2. Product Features

Blamphs.ai is an autonomous control plane for AWS GPU workloads. It runs 24/7, monitoring your infrastructure and taking action to optimize costs and prevent failures.

2.1 Autonomous Scaling

How it works: Blamphs continuously monitors GPU utilization across all nodes. When a GPU is idle (0% utilization) for longer than your configured threshold, Blamphs automatically scales it down.

  • Idle Detection: Tracks GPU usage, CUDA processes, and training job status
  • Smart Scale-Down: Waits for safe moments (between training epochs, after checkpoint saves)
  • Instant Scale-Up: Detects workload spikes and provisions capacity milliseconds before needed
  • Cost Savings: Customers average 40% reduction in GPU spend

2.2 Self-Healing Infrastructure

How it works: Blamphs parses CUDA logs, system metrics, and process health to detect failures before they cascade.

  • CUDA Error Detection: Identifies memory errors, driver crashes, and GPU lock-ups
  • Zombie Process Cleanup: Finds stuck training jobs hogging resources
  • Automatic Cordoning: Marks unhealthy nodes as unavailable and drains workloads
  • Node Recovery: Reboots failed nodes or replaces them with healthy capacity

2.3 Predictive Cost Management

How it works: Blamphs learns your training patterns and forecasts spend based on historical data.

  • Budget Guardrails: Set monthly or weekly spending limits
  • Trend Analysis: Predicts next month's bill based on current usage
  • Savings Reports: Shows exactly how much Blamphs has saved you
  • No Surprise Fees: Get alerted before you hit your budget cap

2.4 Natural Language Configuration

How it works: Configure Blamphs by describing your constraints in plain English. No YAML, no config files.

  • Example Constraints:
    • "Scale down GPUs idle for more than 30 minutes"
    • "Never exceed $80k/month in GPU spend"
    • "Keep at least 4 p4d.24xlarge instances warm at all times"
    • "Reboot nodes that show CUDA errors twice in 10 minutes"
  • Real-time Validation: Blamphs confirms it understands your constraints before applying them

2.5 What Blamphs Monitors

  • GPU Metrics: Utilization, memory usage, temperature, power draw
  • System Health: CPU, RAM, disk I/O, network throughput
  • CUDA Logs: Driver errors, OOM events, kernel timeouts
  • Training Status: Checkpoint saves, epoch completion, loss curves
  • Cost Data: EC2 on-demand pricing, spot pricing, reserved instance usage

3. Security & Privacy

Blamphs.ai is built with security as a core principle. We understand you're trusting us with access to your critical infrastructure.

3.1 AWS Credential Security

  • Read-Only Access: Blamphs never has write permissions to your AWS infrastructure
  • IAM Role-Based: Uses AWS IAM roles (no long-lived access keys)
  • Encrypted Storage: All credentials stored with AES-256-GCM encryption at rest
  • TLS 1.3: All data transmitted over encrypted HTTPS connections
  • Revocable Access: Delete the IAM role anytime to instantly revoke access

3.2 Data Protection

What we collect:

  • GPU utilization metrics (%, memory, temperature)
  • EC2 instance metadata (instance type, region, availability zone)
  • CloudWatch logs (CUDA errors, system logs)
  • Cost and usage data (billing information)

What we DON'T collect:

  • Training data or model weights
  • Source code or application logic
  • Customer data processed by your workloads
  • SSH keys or database credentials

3.3 Infrastructure Security

  • Hosted on AWS: Blamphs runs on secure, SOC 2-compliant infrastructure
  • Network Isolation: Customer data is isolated using VPC segmentation
  • Access Controls: Role-based access with principle of least privilege
  • Audit Logs: All API requests and autonomous actions are logged
  • Regular Audits: Quarterly security reviews and penetration testing

3.4 Compliance

  • GDPR: Full compliance for European Economic Area users
  • CCPA: California Consumer Privacy Act compliance
  • SOC 2 Type II: Available for Enterprise plans
  • Australian Privacy Principles: Compliance for Australian users

3.5 Your Data Rights

  • Access: Request a copy of all data we've collected
  • Deletion: Delete your account and all associated data
  • Portability: Export your metrics and logs in JSON format
  • Revocation: Revoke AWS access instantly by deleting the IAM role

For security inquiries, contact security@blamphs.ai.

4. Billing & Pricing

Blamphs.ai offers transparent, predictable pricing with a free trial to get started.

4.1 Pricing Tiers

We offer three pricing tiers based on the size of your GPU infrastructure:

  • Starter: Up to 10 GPU nodes, $99/month
  • Growth: Up to 50 GPU nodes, $399/month
  • Enterprise: Unlimited nodes, custom pricing (contact sales)

All plans include:

  • Autonomous scaling and self-healing
  • Real-time monitoring and alerts
  • Cost tracking and savings reports
  • Natural language configuration
  • Email support (24-hour response time)

Enterprise plans add:

  • SOC 2 Type II compliance
  • Dedicated Slack channel
  • Custom integrations
  • SLA with 99.9% uptime guarantee

4.2 Free Trial

14-day free trial — no credit card required to start. The trial includes:

  • Full access to all Starter plan features
  • Monitor up to 10 GPU nodes
  • Real-time savings tracking
  • Email support

After the trial: You'll be prompted to select a paid plan. If you don't upgrade, your account will be paused (we won't delete your data for 30 days).

4.3 How Billing Works

  • Monthly Billing: Charged on the same day each month (e.g., if you sign up on March 15, you're billed on the 15th)
  • Automatic Renewal: Subscriptions renew automatically unless canceled
  • Proration: If you upgrade mid-month, we prorate the difference
  • Payment Methods: Credit card (Visa, Mastercard, Amex), ACH transfer (Enterprise only)

4.4 "No Surprise Fees" Guarantee

We hate surprise charges. Here's our promise:

  • Fixed Monthly Price: Your subscription cost never changes without notice
  • No Usage Fees: We charge per node, not per API call or data processed
  • 30-Day Notice: If we raise prices, you'll get 30 days' notice
  • Grandfathered Rates: Existing customers keep their current price for 12 months after a price increase

4.5 Savings Model

How much can I save? Our customers average 40% reduction in GPU costs:

  • Idle GPU detection saves 25-35% on average
  • Spot instance optimization saves an additional 10-15%
  • Self-healing prevents costly downtime and manual intervention

Example: If you currently spend $10,000/month on GPU compute, Blamphs typically saves you $4,000/month. The $399 Growth plan pays for itself 10x over.

4.6 Cancellation & Refunds

  • Cancel Anytime: No long-term contracts or cancellation fees
  • Immediate Effect: Cancellation takes effect at the end of your current billing period
  • Data Retention: We keep your data for 30 days after cancellation in case you reactivate
  • Refund Policy: We generally don't offer refunds, but contact support if you have issues

4.7 Enterprise & Custom Plans

Need more than 50 nodes? Have unique requirements? Contact our sales team:

  • Email: sales@blamphs.ai
  • We offer volume discounts, custom contracts, and on-prem deployment options

5. Support & Resources

Need help? Here's how to reach us:

  • Email Support: support@blamphs.ai (24-hour response time)
  • Documentation: Full guides at blamphs.ai/docs
  • Status Page: Check system status at blamphs.ai (look for "Systems nominal")
  • Security Issues: security@blamphs.ai
  • Sales Inquiries: sales@blamphs.ai

6. Frequently Asked Questions

Q: Can Blamphs accidentally shut down critical workloads?

A: No. Blamphs only has read-only access to your AWS infrastructure. We can recommend scaling actions, but you control which actions we can execute. You can configure constraints like "never scale down nodes running training jobs" to add safety guardrails.

Q: What if I already use Kubernetes or AWS Auto Scaling?

A: Blamphs complements existing tools. We integrate with Kubernetes to detect pod-level GPU usage and work alongside AWS Auto Scaling Groups. Think of us as an intelligent layer on top that understands GPU-specific workloads.

Q: How long does setup take?

A: Most customers are up and running in under 10 minutes. Creating the IAM role takes 5 minutes, and Blamphs auto-discovers your infrastructure immediately after.

Q: Do you support multi-cloud (GCP, Azure)?

A: Not yet. We're AWS-only for now but plan to support GCP and Azure in 2026 Q3. Join the waitlist in your dashboard settings.

Q: What happens if Blamphs goes down?

A: Your infrastructure keeps running normally — we're monitoring and optimizing, not operating your workloads. If Blamphs is unavailable, you simply lose autonomous management temporarily. We have 99.9% uptime SLA for Enterprise customers.

© 2026 Blamphs.ai · Ironclad Equity Pty Ltd
Resources Docs Terms Privacy Powered by Polsia