← All posts

Jan 26, 2025

Designing Idempotent Payment Systems

How I think about retries, duplicate webhooks, race conditions, and safe payment state transitions.

PaymentsBackendDistributed Systems

Payments are where backend theory meets business reality. A flaky UI can annoy users. A broken payment flow destroys trust immediately.

In every payment system, one ugly truth shows up fast: the same event can arrive more than once. Retries happen. Webhooks are duplicated. Users refresh confirmation pages. Support teams trigger manual actions. Network failures create ambiguity.

If the system treats every event like it is definitely new, it eventually creates double processing, duplicate access, incorrect invoices, or broken accounting.

That is why idempotency is not a nice-to-have. It is the foundation.

What idempotency means in practice

Idempotency means that repeating the same valid request does not create a new side effect after the first successful processing.

In a payment flow, that usually means:

  • an order should not become paid twice
  • access should not be granted twice
  • invoice generation should not duplicate
  • CRM updates should not fan out repeatedly
  • state transitions should remain valid even if retries happen

The main sources of duplicate processing

These are the repeat offenders:

  • payment gateway webhooks retried after timeout
  • frontend calling verify payment multiple times
  • users clicking the pay button again
  • job workers replaying stuck messages
  • race conditions between sync API verification and async webhooks

Most payment bugs I have seen were not logic bugs. They were timing bugs.

State machine thinking matters

Payment systems become manageable once you force them into explicit states.

Typical states:

  • initiated
  • pending
  • paid
  • partially_paid
  • failed
  • cancelled
  • refunded

The rule is simple: not every state can move to every other state.

Examples:

  • paid to paid should be a no-op
  • failed to paid may be allowed if the gateway later confirms success
  • cancelled to paid should trigger manual review in many systems
  • refunded to paid should almost never happen silently

When state transitions are explicit, retries become much safer.

Final thought

Payment systems become reliable when they stop assuming perfect sequencing. The real world is noisy. Requests retry, providers resend, users click twice, and queues delay work.

Design for repetition from the beginning. That is what keeps money flows trustworthy.