architecturedeliverabilitydmarcdnsinfrastructure

How we built a closed deliverability loop — and why no one else has

Jasper Moes·May 4, 2026·13 min read

The problem with email infrastructure today

Most email APIs do one thing well: they wrap a deliverability provider behind a nicer SDK and let you POST a JSON payload to send mail. That's a useful service. It's also where most products stop.

The problem is that sending the email is not the interesting part. The interesting parts — the parts that determine whether your auth emails arrive, whether your domain reputation degrades, whether Gmail starts quietly dropping you into spam — happen *after* the API returns 202 "accepted". And almost none of those signals are exposed to you in the same product.

Pull on that thread and you find three separate concerns that every serious sender has to solve:

1. Did this specific message get delivered? Most APIs return a queue receipt and make you wait for a webhook. Webhooks are unreliable, eventually consistent, and useless for AI agents that need to make a decision in the same loop.

2. Who manages SPF, DKIM, DMARC, MTA-STS, BIMI? You do. Forever. Every time SES rotates a DKIM key, every time a new mailbox provider raises the bar (Gmail's Feb 2024 bulk-sender rules, anyone?), you re-publish DNS records and hope nothing breaks.

3. How does the world actually treat your mail? This is what DMARC aggregate reports tell you. They're delivered to whatever mailbox you publish in your rua tag. Most senders publish nothing, get nothing, and fly blind. The senders who do publish typically pay $200+/month for dmarcian, Valimail, or Postmark Spam Score to parse the XML and surface the data. Three concerns, three vendors, three dashboards, three integration points.

The result for SMB SaaS is predictable: you ship send code, you assume it works, you find out it doesn't from a support ticket six weeks later when a customer says "I never got the password reset email." By then your domain reputation has cratered and you don't know why.

We built Truncus because we needed all three solved in one product, on infrastructure we own, with no third-party dependencies on the deliverability path. This post is how we did it.

send_sync: returning truth in the same HTTP response

Here's how a typical email API call looks today:

POST /v1/emails/send
→ 202 Accepted { "id": "msg_abc123" }

The 202 means *we accepted your JSON*. Not *we delivered your email*. To find out what actually happened, you wait for a webhook to arrive at /your/webhook/handler. Maybe in 200ms. Maybe in 30 seconds. Maybe never if the webhook delivery itself fails.

That gap is fine for fire-and-forget marketing batches. It is unworkable for two newer use cases:

AI agents that need to know whether a message landed before deciding what to do next. An agent that books a meeting and emails the confirmation can't proceed to "schedule the calendar invite" if it doesn't know whether the confirmation went through.
Synchronous flows in your product — order confirmations, magic links, password resets — where the user is waiting on the screen. Your UI either lies ("Email sent!" — was it?) or you build async state plumbing for a one-shot operation.

send_sync is our answer. The HTTP request blocks until SES has actually finished the SMTP exchange with the recipient's mail server, then returns the result inline:

POST /v1/emails/send_sync → 200 OK { "gm_id": "gm_01HX2K9...", "status": "delivered", "duration_ms": 1042, "smtp_response": "250 2.0.0 OK"

}

Three things make this work under the hood:

1. SES delivery event notifications. Both of our SES configuration sets (truncus-saas and truncus-internal) have delivery event notifications enabled. When SES finishes the SMTP exchange with the recipient mail server, the event hits our webhook quickly enough to settle the in-flight HTTP response.

2. Synchronous poll with a 25-second ceiling. The handler polls our event store for a terminal state (delivered, bounced, or complained) up to a hard 25-second limit (we run on Vercel where the function ceiling matters). If the ceiling is hit, we return a deferred status with the gm_id and a hint to poll GET /api/v1/emails/{id} — never a hung connection.

3. Fall-through to the Operations API. If you need stricter semantics — at_least_once for retries, exactly_once for billing receipts — send_sync hands off to the Operations API, which provides those guarantees with content-hash deduplication (SHA-256, 10-minute window).

That gm_id is itself a design choice worth pausing on. It's a global message identity prefix that persists across every retry and would persist across providers if we failed over (today, the alternate-provider adapters are stubs — same-provider retries are what actually run). Your logs, your webhooks, your reporting all collapse to one row. Other providers give you a fresh ID per attempt and you reconcile by hand.

The latency reality: typical send_sync end-to-end is around 1 second for major recipient providers. That's slower than a 202 (which is ~50ms), and that's the point. You traded a meaningless 50ms for a truthful 1s.

Zero-touch DNS: delegation done right

Here's what an SMB customer using a typical email API has to manage in DNS, forever:

An SPF record (v=spf1 include:...) that has to merge cleanly with anything else they send
Three DKIM CNAMEs that the provider rotates on its own schedule
A DMARC record (and they have to know what p=quarantine means)
An MX record if they want inbound
An MTA-STS policy file served over HTTPS plus a TXT record pointing to it
A TLS-RPT record for failure reporting

Even a competent ops team puts off two of these and quietly ignores the rest. For a solo founder, the only viable strategy is "set it and pray".

The provider can't fix this for you because they don't own your DNS. They can give you copy-paste records and tell you to put them at your registrar — and that's where every email API stops.

We took the only other path: have the customer delegate a subdomain to us so we own everything inside it.

The customer flow is one DNS change, ever:

1. In the Truncus dashboard, choose a subdomain prefix (mail, send, or email) for their root domain (acme.com).

2. We provision a Route 53 hosted zone for mail.acme.com and return the four AWS NS hostnames.

3. The customer pastes those four NS records at their registrar — Cloudflare, GoDaddy, Namecheap, Squarespace, anything. NS records aren't proxiable, so this works for every provider.

4. Truncus's background poller verifies the delegation, writes SPF / DKIM / DMARC / MTA-STS / TLS-RPT records into the zone, and triggers SES verification. (A t.{sub} tracking CNAME is also published so we can flip on customer-domain click tracking later — see the note at the end of this section.)

5. ~5 minutes later, the domain is active.

After that, the customer never touches DNS again. We:

Run a weekly drift-correction cron that re-asserts SPF / DKIM / DMARC / MTA-STS records — so a manual edit in the AWS console can't silently break delivery. (Note: SES Easy DKIM keys themselves are managed and rotated by AWS; customer-controlled BYODKIM rotation is on the roadmap.)
Publish DMARC at p=quarantine with strict alignment by default.
Publish an MTA-STS policy file from mta-sts.truncus.co (CNAME'd from the customer's subdomain).
Optionally enable inbound by adding an MX record (one-click toggle in the dashboard health view).
Run a daily cleanup cron that marks zones in 7-day NS-loss states and hard-deletes after a 30-day grace period — so churned customers don't leave dead zones racking up the $0.50/mo Route 53 fee forever.

This is harder to copy than it looks. To do it correctly you need:

Per-customer hosted zone provisioning in Route 53 (or equivalent), priced at $0.50/zone/mo.
Authoritative DNS automation with idempotent UPSERT semantics (Route 53's ChangeResourceRecordSets).
NS verification with multiple-resolver checks so propagation lag doesn't show false negatives.
Drift correction that detects and re-publishes any record that's been deleted from the zone manually.
Lifecycle for orphaned zones (the cleanup cron mentioned above).

The Route 53 part alone is roughly two days of work for someone who's never used the SDK before. The lifecycle automation is another week. Most ESPs don't bother because their customers haven't asked — and their customers haven't asked because they don't know to.

On-domain click tracking — what's shipped vs what's coming. Tracking links in emails sent through Truncus today are served from truncus.co URLs (e.g. truncus.co/api/t/c/{token}). We publish a t.{customer-subdomain} CNAME during delegation so we can flip on customer-domain tracking later, but it isn't active yet. The blocker is TLS: Vercel only issues HTTPS certs for domains explicitly added to a project, which doesn't scale to one cert per customer. The fix is on-demand TLS via Cloudflare for SaaS (or equivalent); when that lands, the env flag flips on and the existing CNAME starts serving. Until then, links use the truncus.co origin. Honest fallback over broken bullet point.

DMARC aggregator: closing the loop

Once a domain is sending real mail, mailbox providers (Gmail, Yahoo, Outlook, Proton, Fastmail, and dozens of smaller ones) start sending DMARC aggregate reports — XML documents containing every IP that sent mail claiming to be from your domain in the last day, whether it passed SPF/DKIM alignment, and what the receiving provider did with it.

These are the only outside view of how your domain is actually treated. Without them you're guessing.

The reports go to whatever mailbox you publish in your DMARC rua tag. Most senders publish rua=mailto:dmarc@example.com, never set up that mailbox, and lose every report. The slightly more sophisticated forward to a third-party service like dmarcian for $200/month and up. The very few who run their own intake build the same pipeline we did:

DMARC RUA email → SES inbound → S3 bucket → Lambda → parsed XML → Postgres → dashboard

Our pipeline is set up so that when Gmail (or any provider) sends a report:

1. SES eu-west-1 receives the email at dmarc-reports@inbound.truncus.co (a separate subdomain so we don't conflict with whatever email routing exists on the apex).

2. An SES Receipt Rule writes the raw RFC 822 message to S3 (truncus-dmarc-inbound) and triggers a Lambda.

3. The Lambda reads the email, walks the MIME attachments, and finds the XML — handling the three formats providers actually send: .xml, .xml.gz (Google, Yahoo), and .zip containing one .xml (Microsoft).

4. The Lambda POSTs the decompressed XML to /api/internal/ingest-dmarc on truncus.co, authenticated with a shared secret.

5. The ingest endpoint parses the XML — extracting org name, report ID, time range, and per-record alignment / disposition counts — and writes one summary row per report into truncus_dmarc_reports (unique on (orgName, reportId)).

6. The data is queryable through /api/dashboard/domains/{id}/dmarc (returning per-org report counts, totals, and pass-rate over a 30-day window). The customer-facing visualisation that turns those rows into a chart is the next iteration; the data is being captured from day one so there's history to chart against.

A few details worth flagging:

Idempotency. Reports are unique on (orgName, reportId). Providers do retry, especially on transient failures. The upsert means duplicate deliveries are no-ops, not duplicate rows.

Lifecycle hygiene. The S3 bucket has a 7-day expiry on all objects. Lambda deletes successfully-ingested objects immediately; the lifecycle is the safety net for failed processing. We never accidentally hoard customer mail.

No third-party hop. The path is SES → S3 → Lambda → our Postgres → our dashboard. No data leaves AWS eu-west-1 except the final Lambda → ingest-endpoint hop, which is also EU-resident (Vercel fra1).

One failure mode that matters. If mail.truncus.co (or any sender domain) hasn't sent enough volume to a provider, that provider won't generate aggregate reports. There's no fix for this; it's just how DMARC works. We surface "0 reports yet" as a real signal in the dashboard rather than hiding it.

The architectural simplicity is the point. There's nothing exotic here — every piece is a 2014-era AWS primitive. What's unusual is having all three layers (transport, ownership, perception) live in the same product, on the same dashboard, billed at the same SMB price point.

Why this matters more than features

Each individual piece exists somewhere:

Synchronous delivery: a few enterprise-tier vendors offer it as a paid add-on
DNS automation: registrars do part of it; nobody does the full ESP delegation
DMARC ingestion: dmarcian, Valimail, Postmark Spam Score, and similar tools

What no one ships is the loop closed end-to-end at the SMB tier. You can buy any one piece for an SMB budget. You cannot buy all three integrated.

Three buyer profiles notice the difference immediately:

AI agent platform builders. A LangChain or Vercel AI SDK agent that emails a customer can't proceed if it doesn't know whether the email landed. send_sync is the only API surface that lets the next decision happen in the same execution context. Webhook-based confirmation requires durable workflow state, which doesn't exist in an in-process agent loop.

Universities, healthcare, fintech. These buyers are increasingly required by their compliance frameworks to demonstrate "evidence of email deliverability" — not as a marketing claim but as an audit artifact. A DMARC aggregator dashboard with real reports going back 90 days *is* the evidence. Without it you fail the audit and lose the deal.

SaaS founders rebuilding their auth flow. The first time you have a customer say "I never got the magic link email" and you can pull up the actual delivery confirmation timestamp from send_sync and the SES SMTP response code, you understand viscerally why this matters. Before that, every email outage is a guessing game.

For everyone else — marketing-list senders, low-stakes notifications, products where email is a nice-to-have — this is overkill. We're fine with that. Truncus is positioned for the buyer for whom email is load-bearing.

What we built it on

For anyone evaluating the architecture risk:

Layer	Technology
Transport	AWS SES eu-west-1 (50K/day, 14 sends/s)
DNS	Route 53 hosted zones (one per delegated subdomain)
Inbound DMARC	SES Receipt Rules → S3 → Lambda (Node.js 20)
Storage	Supabase PostgreSQL (Frankfurt, EU residency)
API + Dashboard	Next.js 15 on Vercel (`fra1` region)
Region alignment	All EU. No transatlantic hops on the delivery or reporting paths.
Vendor count on the deliverability path	One (AWS).

The "vendor count" line is the one we care about. Every additional vendor on the deliverability path is another company that can have an outage, change pricing, get acquired, or quietly downgrade their service. By owning everything from send_sync to DMARC ingestion, we control the dependencies that determine whether your email works.

That's the trade we made. Higher upfront engineering cost, lower long-term risk surface, and a product that's actually one product instead of three integrations pretending to be one.

Try it

The free tier is 3,000 emails/month with one seat — enough to verify send_sync works on your stack and to see the first DMARC reports show up in the dashboard a few days after you delegate a subdomain.

Start with the free tier | Read the manual

Your emails should always deliver.

Multi-provider failover, synchronous delivery confirmation, EU-first routing. Try Truncus free.

Start free See pricing