OpenClaw Pool Management Console

πŸ–₯ Tenants Management

Hosts (Group)
Shared Skills
Tenants
IDNameStatusTagsvCPUMemoryDiskGuest IP PortRootfsVM HealthGatewayActions

βš™ Claw Configuration

Config Templates

OpenClaw configuration templates for different LLM providers and models.

No custom templates. Tenants use the built-in default config.

MCP Tools (via AgentCore Gateway)

All tenant VMs auto-connect to the AgentCore Gateway and gain these MCP tools. Tool definitions live in deploy/lambda/agentcore_tools/ + deploy/stack.py.

AgentCore not enabled, or no tools registered yet. Set agentcore.enabled: true in config.yml and redeploy.

Skill Groups

Groups bundle skills together so a tenant can subscribe via group: "team-sre" instead of listing every skill. A tenant's effective skill set = tenant.skills βˆͺ group.skills. Tenants without scoping fields get every skill (legacy broadcast).

No groups defined. Click + New Group to bundle skills for a team.

Shared Skills

Skills are shared across all tenants. They're plain markdown files in s3://${ASSETS_BUCKET}/skills/<name>/SKILL.md and are synced to every host every 5 minutes via cron, then injected into VMs at launch. Click a row to view / edit. Use Groups (above) to scope skills to specific tenants.

No skills configured. Click + New Skill above (or upload to s3://${ASSETS_BUCKET}/skills/<name>/ directly).

Observability

Status

Prometheus (AMP)
Grafana (AMG)
SNS Notifications
πŸ’‘ To enable: set metrics.enabled: true in config.yml and redeploy. The stack will provision Amazon Managed Prometheus + Grafana, and ADOT collectors on each host start scraping in ~3 minutes after rollout.

Per-VM Metrics

Each host's host-agent exposes these gauges on :8899/metrics. ADOT scrapes every 30s and remote-writes to AMP via SigV4 (no static credentials).

MetricTypeLabelsDescription
openclaw_vm_healthgauge (0/1)tenant1 if VM responded to ping, else 0
openclaw_vm_cpu_pctgaugetenantPer-VM CPU usage (percent of allocated vcpus)
openclaw_vm_memory_used_mbgaugetenantPer-VM memory in active use (MB, from VmRSS)
openclaw_vm_memory_balloon_mibgaugetenantBalloon size held by the host (MiB)
openclaw_vm_disk_used_mbgaugetenantPer-VM data disk used (MB)
openclaw_vm_disk_total_mbgaugetenantPer-VM data disk capacity (MB)
openclaw_vm_disk_used_pctgaugetenantPer-VM data disk used (percent)

Sample PromQL

Copy into Grafana β†’ Explore β†’ AMP datasource.

Memory used by all running VMs of a tenant
sum by (tenant) (openclaw_vm_memory_used_mb)
Hosts with at least one unhealthy VM in the last minute
min_over_time(openclaw_vm_health[1m]) == 0
Tenants over 90% disk usage
openclaw_vm_disk_used_pct > 90

Endpoints

AMP remote_write:
Grafana:

πŸ’Ύ Backups

TenantSource StatusBackup TimeSizeActions

Backups are retained for 7 days (S3 lifecycle). Orphan backups are from tenants that have been deleted β€” restoring creates a new tenant with the backup's data volume.

πŸ”§ Settings

API Connection

AgentCore

Status:

Infrastructure

Optional features and their current status. Toggle in config.yml and re-run ./setup.sh.

Multi-AZ HA
Prometheus + Grafana
AWS WAF
Cognito + RBAC
SNS Lifecycle Events
Per-tenant Quotas

Host Overcommit Ratios

Allocatable resources = physical Γ— ratio. Tune in config.yml under host:.

CPU overcommit:
Memory overcommit:
Default per tenant:

Fleet by AZ

Live distribution of registered hosts and their tenants across Availability Zones. Set multi_az.enabled: true in config.yml to spread the ASG.

Availability ZoneHostsVMsvCPU used / total

System

Site URL:
GitHub: aws-samples/sample-multi-tenant-openclaw-on-firecracker