Why I Choose Temporal Over AWS Lambda + EventBridge For Long‑Running Business Processes
2025-08-31
TL;DR
- Temporal gives you code‑as‑orchestration with built‑in durability, reliable timers, automatic retries/backoff, compensation, and rich visibility.
- Lambda + EventBridge is great for stateless event handling and decoupling, but you end up hand‑rolling orchestration concerns: state, correlation, idempotency, timeouts, retries, compensation, and observability.
- If your process is long‑running, multi‑step, requires exactly‑once semantics at business level, or needs humans in the loop, Temporal dramatically reduces complexity and production risk.
The problem
Many teams start with AWS Lambda functions wired by EventBridge (and sometimes SQS/SNS) to stitch together business processes. It’s fast to begin, but as flows grow (sagas, fan‑out/fan‑in, human approvals, external service retries, SLAs), complexity balloons:
- Process state lives “everywhere”: in logs, DynamoDB, ad‑hoc status records, or in people’s heads.
- Retries and backoff are manual: each function handles its own errors; cross‑step coordination is fragile.
- Timers and delays are brittle: waiting hours/days means CloudWatch rules, DLQs, or custom schedulers.
- Compensation is inconsistent: each step must know how to roll back previous steps.
- Observability is fragmented: traces span multiple services; it’s hard to answer “where is my order?”
Temporal gives you a dedicated orchestration runtime with primitives designed for exactly these concerns.
Typical Lambda + EventBridge orchestration
A common pattern:
- Command received → publish EventBridge event.
- One Lambda handles step A → emits event → step B Lambda consumes, etc.
- For delays, add CloudWatch rules/EventBridge scheduled events.
- Persist state in DynamoDB with correlation IDs.
- Implement idempotency keys to survive retries and at‑least‑once delivery.
- Stitch together metrics/logs for visibility; hope all pieces agree.
This works, but you own the orchestration concerns forever.
What Temporal changes
Temporal runs a fault‑tolerant orchestration service. You write:
- Workflows (deterministic, durable code) that describe the process.
- Activities (regular code) that call external systems (APIs, DBs, other services).
Temporal records every decision and timer in an event log. If a worker crashes or Pods/Lambdas recycle, the workflow resumes from the last event. You get:
- Durable timers: sleep for minutes to months reliably (no CloudWatch hacks).
- Automatic retries/backoff: per activity with fine‑grained policies.
- Heartbeats: detect stalled activities.
- Signals/Queries: push external events in (human approvals, webhooks) and fetch live status.
- Saga pattern: compose child workflows/compensations cleanly.
- End‑to‑end visibility: one place to answer “what’s the state of workflow X?”
A tiny Python example
# src/workflows/onboarding_workflow.py
from datetime import timedelta
from temporalio import workflow
from temporalio.common import RetryPolicy
@workflow.defn
class OnboardingWorkflow:
@workflow.run
async def run(self, user_id: str) -> None:
# Execute activities with retries/backoff
await workflow.execute_activity(
"create_account",
user_id,
start_to_close_timeout=timedelta(minutes=5),
retry_policy=RetryPolicy(maximum_attempts=5, backoff_coefficient=2),
)
await workflow.execute_activity(
"send_welcome_email",
user_id,
start_to_close_timeout=timedelta(minutes=5),
)
# Durable sleep: Temporal persists the timer without billing compute
await workflow.sleep(timedelta(days=3))
await workflow.execute_activity(
"send_nudge",
user_id,
start_to_close_timeout=timedelta(minutes=5),
)
# src/activities.py
from temporalio import activity
@activity.defn
async def create_account(user_id: str) -> None:
# call your APIs/DBs; raise on retryable failures
...
@activity.defn
async def send_welcome_email(user_id: str) -> None:
# call email provider
...
@activity.defn
async def send_nudge(user_id: str) -> None:
# call email/SMS provider
...
The workflow is your orchestration “source of truth.” Temporal persists progress and restarts it safely after failures or deployments.
Head‑to‑head: Temporal vs Lambda + EventBridge
-
State management
- Lambda+EventBridge: DIY with DynamoDB + correlation IDs, prone to drift.
- Temporal: workflow state is persisted automatically, strongly consistent at the workflow boundary.
-
Retries/backoff
- Lambda+EventBridge: each function owns its own retry, often inconsistent.
- Temporal: tune per activity; built‑in exponential backoff, max attempts, jitter.
-
Delays/timers
- Lambda+EventBridge: CloudWatch/EventBridge schedules, DLQs, custom schedulers.
- Temporal: durable timers as a first‑class primitive.
-
Compensation (sagas)
- Lambda+EventBridge: manual rollback logic across services.
- Temporal: natural to encode compensations or use child workflows.
-
Human‑in‑the‑loop
- Lambda+EventBridge: custom endpoints/state machines to pause/resume.
- Temporal: use Signals to pause/resume/approve from UIs or webhooks.
-
Observability
- Lambda+EventBridge: spread across logs, X-Ray, CloudWatch; hard to answer business‑level questions.
- Temporal: every workflow has a timeline; queries return live state instantly.
-
Idempotency & exactly‑once at business level
- Lambda+EventBridge: must engineer idempotency carefully per handler.
- Temporal: deterministic workflow execution + activity idempotency patterns reduce duplicate effects.
-
Local development & testability
- Lambda+EventBridge: needs stubs/mocks of AWS services.
- Temporal: run Temporal locally; unit test workflows like normal code.
-
Cost shape
- Lambda+EventBridge: pay per invocation + per event; long waits require scheduled triggers and can accumulate orchestration glue code.
- Temporal: pay for Temporal Cloud or self‑host the service; workers run only when tasks exist, and timers don’t consume compute while waiting.
Keeping EventBridge (and Lambdas) where they shine
EventBridge is excellent for event distribution and decoupling. Many teams do:
- Keep EventBridge for domain events (auditing, analytics, integrations).
- Use Temporal for orchestration of business processes.
- Wrap existing Lambdas as Temporal activities (via HTTP/API Gateway) to migrate progressively.
This lets you separate concerns cleanly: EventBridge spreads events, Temporal coordinates business workflows.
Migration playbook
- Pick one process with retries/timers/human steps (e.g., onboarding, order fulfillment).
- Create a coordinator workflow in Temporal that calls your existing services/Lambdas as activities.
- Add durable timers instead of CloudWatch schedules.
- Add retry/backoff policies per activity; implement idempotency once where effects occur.
- Use Signals/Queries to integrate portals/back‑office tools.
- Roll out observability: surface workflow status in dashboards.
Most teams see a sharp drop in orchestration bugs and “where is my order?” support tickets.
When Lambda + EventBridge may be enough
- One‑shot, stateless event handlers.
- Very short‑lived flows without retries/timers.
- Simple fan‑out with no coordination/compensation.
- Prototyping where orchestration complexity is minimal.
Conclusion
For long‑running, multi‑step, business‑critical processes, Temporal replaces a pile of bespoke glue with a coherent, reliable runtime and a much better developer experience. Keep EventBridge for decoupling and integrations; promote orchestration to code you can reason about, test, and observe—Temporal workflows.