Open with energy.
Read the subtitle out loud - it's the hook.
This is a 25-minute talk: aim to land 3 ideas, not 30.
The 3 are: (1) state drifts across systems, (2) durable execution is application
code that survives process death, (3) you'd use it where you write runbooks today.
Ask for a show of hands: "who's been paged for a half-finished workflow?"
30 seconds max.
Establish credibility, then move on.
If the room is mixed (engineers + managers), mention you've shipped Temporal to
production at companies with both Java and Go stacks - it lands trust faster.
Use these as quick chat prompts.
Ask for "yes" / "no" or a one-word answer; do not discuss every response.
- Have you ever been paged for a half-finished business process?
- Have you had to manually reconcile whether an external API call succeeded?
- Have you written retry logic that later needed a retry limit, timeout, or
backoff policy?
- Have you debugged a stuck cron, queue consumer, or scheduled job after it was
already too late?
- Have you used a workflow/orchestration tool before: Airflow, Step Functions,
Camunda, Conductor, Argo, or something homegrown?
- If you already use Temporal, where: local experiments, one service, or
production-critical workflows?
Don't dwell. ~20 seconds.
The agenda is signposting so the audience can locate themselves later.
Keep it short - orientation, not detail.
Section divider.
Pause for 2-3 seconds.
The next 3 slides set up the pain the rest of the talk resolves.
Tone shift: slow down, get serious about real outages the audience has lived
through.
Read the bullets in voices.
The first sounds like an e-commerce backend, the second like a data team, the
third like an ops/devex flow.
The audience will recognize at least one - that's the point.
Land the final line with weight: "look easy until one step fails."
Pace yourself - each bullet is a real incident shape.
After the third one ("crashed between DB write and publish") pause; that's the
moment people nod because they remember a specific outage.
Closing line is the slogan to repeat: "runbook, not a button."
About 5 minutes for the four stacks combined.
Don't bash any tool - each solves a real problem.
The framing is "what each leaves to you."
Quick.
Cron isn't a strawman - it's where many teams start.
The reason it breaks is incidental complexity that nobody owns: logs scattered,
lock files invented, monitoring bolted on.
For Airflow users in the room, validate that Airflow IS great for what it was
built for.
The pitch is: don't replace your DAGs - move the cross-system flows out of
Airflow into Temporal, keep the data DAGs in Airflow.
The JSON-state-machine point lands hardest with engineers.
Ask: "would you review a 4000-line YAML for an order workflow as readily as Java?"
The Lambda 15-min cap is the sneaky one - many teams hit it and don't realize it
for months.
Don't position as Kafka vs Temporal.
Position as Kafka + Temporal: Kafka is the bus between teams, Temporal is the
brain inside a team.
The quote at the bottom is the line they'll quote back at you - say it slowly.
The reframe.
The problem isn't any single tool - it's that workflow state is scattered across
N+1 places.
Make eye contact, lean into "drift is what the 2 AM page is." It's the bridge to
the Temporal section.
Tone shift again - from problem to solution.
Energy back up.
The talk inflects here; if you're 11 minutes in, you're on schedule.
THIS IS THE CENTRAL CONCEPT - this slide defines the term, the next one shows it.
Spend ~30 seconds here on the definition. The persistence is automatic; you don't
write checkpoint/restore code.
Then advance to the code.
Spend ~60 seconds here.
Walk the code: this is normal Java. There's no special framework. The methods are
just method calls.
The MAGIC is the last line.
Then say: "the Workflow doesn't care which JVM is running it. The state lives in
the cluster, not on a host."
Go is often the clearest SDK for engineers coming from backend services.
Point out the shape: ExecuteActivity records a command in history, and Get waits
for the durable result.
If a Worker dies after reserveInventory, replay rebuilds paymentID and
reservationID from history before scheduling ship.
Use this to defuse "is this Java-only?" concerns.
Python is async-first, but the mental model is the same: durable Workflow
decisions, side effects in Activities, result replay from history.
The 4 steps are the entire model.
If they only remember this one slide, the rest follows.
Use a whiteboard metaphor: "imagine someone took notes of every decision your
program made; you can replay those notes to recreate the program's state."
Quick fly-through - 90 seconds.
Don't go deep on any one.
The point is breadth: ALL of these are built-in.
Each bullet represents code your team currently writes and maintains.
The "Workflow.sleep(Duration.ofDays(30))" line gets a chuckle from people who have
written cron-replacement logic.
30 seconds.
This is the mental model summary.
Most important: Task Queue is JUST a string - it's not Kafka.
It's not a database.
It's a routing key.
This often confuses people coming from message-queue thinking.
Section divider.
The next 3 slides are the "show, don't tell" moment.
Each slide is a complete pattern in ~15 lines of Java.
Walk top to bottom.
Stop on `saga.addCompensation(...)` - explain that this is registered IMMEDIATELY
after the forward step succeeds.
If the Workflow dies between the forward step and the compensation registration,
the compensation is lost.
So you write them paired.
The catch handler runs in LIFO order.
Compensations also retry.
The point is at the bottom.
Don't read the whole builder - point at `setIntervals(Duration.ofHours(1))` and
`setJitter(...)` and say: "this is what your cron line wished it was." For Airflow
users: this replaces the scheduler, not your DAG logic.
This is the killer slide for product teams.
The "wait for human approval" pattern is often a custom-built monstrosity.
Here it's three lines.
The Workflow doesn't poll.
The Worker doesn't keep a thread.
The state lives in the cluster; when the Signal arrives, a Worker (maybe a
different one) picks up the Workflow and the await unblocks.
Pivot to credibility/social proof.
The next two slides answer the implicit question: "who else is using this and what
for?"
Pick the one that matches the audience.
If they're in fintech, dwell on payments.
If they're a platform team, dwell on infrastructure provisioning.
The AI agents bullet is newest and lands hardest in 2024+ rooms.
Skim.
The point is breadth - this isn't a niche tool.
Stripe and Snap are the strongest names for fintech.
Netflix for data platforms.
Datadog for SRE-leaning teams.
If the audience asks for case studies later, point them at
https://temporal.io/case-studies.
Don't read the table - point at one row and discuss.
The most useful row is "Long human waits" because it surprises people.
Cron and Airflow are not built for "wait for a human for 3 days." The "Vendor
neutral" row matters for Step Functions skeptics.
Critical slide for credibility.
If you don't show the limits, the audience suspects you're selling.
The runbook line at the bottom is the test: "if the next person on call would need
a runbook to recover, it's a Workflow."
This is a vocabulary reset before the exercise.
Many teams use "runbook" and "playbook" interchangeably.
For this talk, make the distinction operational: runbook is reactive recovery;
playbook is repeatable coordination.
Do not make either one sound bad.
A good SRE team needs both.
The key point is that repeated runbook execution is evidence that the system has
pushed application state recovery onto humans.
Use this as the memory hook.
The important distinction is containment: a playbook can contain runbooks, but it
also carries scenario judgment, branching, and coordination.
Use examples:
- Runbook: "payment charged but order not shipped" recovery.
- Playbook: "new enterprise customer onboarding" or "security exception
approval" where the steps are known but involve humans and systems.
This sets up the discussion exercise: participants should classify their own
workflow as runbook-shaped, playbook-shaped, or not Temporal-shaped.
Online ILT flow:
1. Set a 5-minute timer and ask everyone to type their answer privately first.
2. At 2 minutes, ask them to paste a short version in chat:
"<workflow> / <failure mode> / <Temporal-shaped or not>".
3. If the platform supports breakouts and the group is >8 people, use 3-minute
pairs before chat share-out. Otherwise keep it all in chat.
4. Pick two examples: one strong Temporal fit and one non-fit. Ask each person
to unmute for 30 seconds only if they are comfortable.
5. Close by tying answers back to the runbook test: if the next on-call would
need step-by-step recovery instructions, it may be a Workflow.
We're 20 minutes in.
The last 5 minutes are about giving them an action.
The mistake here is recommending a big migration.
Don't.
Recommend ONE workflow.
Step 5 is the lesson.
The biggest mistake teams make is REDESIGNING during the migration.
You don't.
You move first, then improve.
Mechanical migration is faster, easier to verify, and lets you compare apples to
apples.
End with a concrete action.
If the room is on laptops, ask them to run it.
The dev server is genuinely 5 minutes; the brag is fair.
For remote audiences, this is the screen they screenshot.
The three things they should remember.
Read them slowly.
Each one maps to the agenda's central claim.
If they only remember the "runbook → Workflow" habit, the talk worked.
Close strong.
Land on the quote.
Don't immediately segue to Q&A - let it sit.
Then: "Questions?"
Leave on screen during Q&A.
The slides URL gets photographed; make sure the QR code works if you've added one
for in-person events.