Skip to content

Email Service

This document explains the email-service design and behaviour when using Mailgun and AWS SES as delivery providers. It describes the delivery flow, provider-specific details (API, webhooks, credentials), failure handling, tracking, and operational guidance.

Responsibilities

  • Accept send jobs (from BullMQ worker / orchestrator).
  • Render templates, inject tracking, and produce provider-specific payloads.
  • Call provider APIs and handle responses (success, soft/hard failures).
  • Listen and process provider webhooks to update message state and metrics.
  • Emit events (message.sent, message.delivered, message.bounced, message.opened) to Kafka or internal bus.
  • Expose observability metrics and audit logs for compliance.

Sequence (single message)

  1. Worker pulls job from BullMQ including message, template id, recipient, metadata and idempotency key.
  2. Worker posts job to Email Service (or performs send logic directly via adapter).
  3. Email Service renders template, injects tracking (click/open tracking links), attaches headers for idempotency and analytics.
  4. Email Service selects provider adapter (Mailgun or SES) using routing rules and circuit-breaker state.
  5. Adapter calls provider API; on a 2xx accepted response the message is marked sent and an event emitted.
  6. Provider delivers email; provider sends webhooks for delivered/bounced/opened/clicked.
  7. Webhook handler verifies signature, updates StepExecution/Message status and emits analytics events.

Provider comparison

Feature / ConcernMailgunAWS SES
API typeHTTP REST, webhooksAWS SDK (SendRawEmail), SMTP, SNS for events
AuthenticationAPI key (secrets store)IAM roles / keys (preferred IAM role)
Webhooks / EventsNative webhook endpoints (delivered/bounce/open/click/complaint)SNS topics for bounces/complaints/deliveries; Pinpoint for opens/clicks
Delivery reliabilityHigh; provider-managed poolsHigh; AWS-managed with strong SLA
Rate limiting & quotasProvider limits; per-domain quotasAccount-level quotas; scalable with request to AWS
Cost modelPer-message + add-onsPer-message + regional pricing; often cheaper at scale
Regional availabilityGlobal endpointsRegion-specific (choose region for SES)
Setup complexitySimple DNS for domain + webhooksVerify domain + IAM + SNS wiring (slightly more ops)
Tracking (opens/clicks)Built-inRequires Pinpoint or custom tracking
Bounce / complaint handlingWebhooks & suppression listsSNS notifications; integrate with suppression logic
Best forQuick setup, advanced email features, developer ergonomicsDeep AWS integration, cost at scale, IAM security
Sandbox/testingMailgun sandbox domainsSES sandbox mode (limited recipients)

Provider-specific details

Mailgun

  • API: HTTP REST (POST to https://api.mailgun.net/v3/[domain]/messages or batch endpoints).
  • Authentication: API key (stored in secrets manager). Use per-domain keys or a single key scoped to the sending domain.
  • Capabilities: open/click tracking, tagging, templates (Mailgun templates can be used but rendering is usually done in-app), batch sending via recipient-variables.
  • Best practices:
    • Use a verified domain with correct SPF, DKIM, and DMARC records.
    • Use Mailgun webhooks for delivered, dropped, bounced, opened, clicked, complained.
    • Respect suppression lists; do not re-send to addresses in Mailgun suppression without a deliberate flow.
    • Use message headers: X-Mailgun-Variables, X-Mailgun-Tag for analytics.
  • Response handling:
    • 200/202 -> accepted. Treat as success and persist provider message id for later correlation.
    • 4xx -> client errors (bad request, validation): treat as non-retriable unless transient (e.g., rate-limit headers suggest retry-after).
    • 5xx or 429 -> transient: retry with exponential backoff. Observe Retry-After header if present.
  • Webhook security: validate Mailgun signature (timestamp + token + signature) and timestamp window.

Key notes for Mailgun-only:

  • Use Mailgun webhook endpoint protected by signature verification and a narrow ingress rule (IP or firewall) if possible.
  • Persist Mailgun message-id returned on send for later correlation with webhook events.
  • Respect Mailgun suppression lists; optionally sync suppression state into your DB.
  • Monitor API error rates and use a circuit-breaker to avoid cascading failures; without fallback the circuit-breaker should trigger alerts and optionally pause campaigns.

AWS SES

  • API: SMTP or AWS SDK (SendRawEmail / SendTemplatedEmail). Prefer AWS SDK with IAM roles for servers in AWS.
  • Authentication: IAM roles (preferred for ECS/EKS tasks) or IAM user keys stored securely.
  • Capabilities: reputation metrics via SES, sending statistics, event publishing via SNS (deliveries, bounces, complaints, opens/clicks if using open tracking via Amazon Pinpoint or custom tracking).
  • Best practices:
    • Verify sending identities (domains or emails) and set up SPF/DKIM.
    • Use dedicated configuration sets to capture events and route them to SNS topics.
    • Use SNS -> HTTP(S) endpoint or SQS to receive SES events reliably.
    • For large volume, request production access and increase sending quotas.
  • Response handling:
    • AWS SDK returns message-id; treat as accepted on success.
    • Throttling (ThrottlingException) -> retry with exponential backoff and jitter.
    • Hard bounces/complaints are delivered via SNS notifications.
  • Security: use least-privilege IAM policies (ses:SendRawEmail, sns:Publish if needed). Rotate credentials regularly.

Key notes for SES-only:

  • Configure SES configuration sets and route events to SNS topics for reliable delivery of bounces/complaints/deliveries.
  • Use IAM roles for ECS tasks so the adapter can call SES without embedding long-lived credentials.
  • Use SQS as a durable buffer between SNS and your webhook handler to ensure reliable processing and replay.
  • Monitor SES sending quotas and set alarms for ThrottlingException patterns; plan for auto-scaling workers if necessary.

Main considerations — Mailgun

  • Quick onboarding and developer-friendly REST API.
  • Strong built-in tracking and tagging features; convenient webhooks for event capture.
  • Use when you need fast setup, rich per-message metadata, and a provider-managed deliverability offering.
  • Operationally: ensure suppression lists are respected and signatures are validated. Monitor request quotas and set circuit-breakers for 5xx/429 patterns.

Main considerations — AWS SES

  • Best when you want to stay inside AWS ecosystem and leverage IAM, SNS, and CloudWatch directly.
  • Typically lower cost at scale and integrates well with other AWS services (SNS, SQS, CloudWatch, Kinesis).
  • Use when you need predictable scaling and strong operational control; plan for SNS wiring and domain verification steps.

Single-provider flows (no runtime selection)

If your deployment uses a single provider (for example you commit to Mailgun only or SES only), the runtime flow is simpler because provider-selection and fallback logic are removed. Below are two focused flows and the small operational differences you should consider.

Operational differences summary

  • Webhook delivery: Mailgun posts directly to your HTTP webhook; SES pushes events to SNS (recommended) which can fan out to SQS/HTTP. SES + SNS + SQS is more resilient at the cost of extra setup.
  • Authentication: Mailgun uses API keys and webhook signatures; SES uses IAM (preferred) and SNS message verification.
  • Tracking: Mailgun provides easier built-in open/click tracking; SES often requires Pinpoint or custom tracking redirects.
  • Failover: With a single provider there is no automatic failover in-app — instead rely on retries, alerting and manual remediation.