Ephemeral Systems: Engineering for Short-Lived Compute and Trust
Cost and Capacity
A long-form research note on designing distributed systems where workloads, identities, and policies are intentionally short-lived.
Cost model shift
Ephemeral systems change the cost profile from stable baseline to burst-heavy variability. Compute may appear cheaper per unit, but control-plane load, identity issuance, telemetry, and cache invalidation can dominate operational costs if unbounded.
Capacity planning dimensions
Capacity planning should include at least four independent curves:
- request throughput
- identity issuance/refresh volume
- policy distribution fan-out
- telemetry ingestion and query load
Ignoring any one curve creates hidden saturation points.
Hot-path budgeting
Every request path should have a “trust budget” in milliseconds. Example allocation:
- identity verification: 5-15 ms
- authorization check: 3-10 ms local or 10-40 ms remote
- telemetry emission (async): near-zero blocking
These budgets force explicit tradeoffs between strictness and latency.
Caching economics
Caching reduces cost but introduces staleness risk. A useful approach is tiered cache policy:
- short TTL for high-risk privileges
- medium TTL for standard tenant actions
- longer TTL for low-risk read-only paths
Cache invalidation events should be measured as first-class platform traffic, not side effects.
Multi-tenant fairness
Autoscaling can hide noisy-neighbor problems temporarily, but fairness requires policy. Rate and concurrency limits should be tenant-aware and service-aware. Otherwise, one tenant’s burst can consume shared policy evaluation or identity issuance capacity.
Spend observability
Link cost metrics to engineering domains:
- cost per decision type
- cost per tenant tier
- cost per policy rollout
- cost of degraded mode events
These views help teams optimize architecture, not only cloud invoices.
Practical optimization order
When reducing costs, the best order is:
- remove redundant decision calls
- reduce high-cardinality telemetry where not needed
- optimize token refresh cadence
- right-size control-plane fan-out and polling intervals
Premature micro-optimizations in worker code rarely produce the largest gains.