Appearance
Email Service
This document explains the email-service design and behaviour when using Mailgun and AWS SES as delivery providers. It describes the delivery flow, provider-specific details (API, webhooks, credentials), failure handling, tracking, and operational guidance.
Responsibilities
- Accept send jobs (from BullMQ worker / orchestrator).
- Render templates, inject tracking, and produce provider-specific payloads.
- Call provider APIs and handle responses (success, soft/hard failures).
- Listen and process provider webhooks to update message state and metrics.
- Emit events (message.sent, message.delivered, message.bounced, message.opened) to Kafka or internal bus.
- Expose observability metrics and audit logs for compliance.
Sequence (single message)
- Worker pulls job from BullMQ including message, template id, recipient, metadata and idempotency key.
- Worker posts job to Email Service (or performs send logic directly via adapter).
- Email Service renders template, injects tracking (click/open tracking links), attaches headers for idempotency and analytics.
- Email Service selects provider adapter (Mailgun or SES) using routing rules and circuit-breaker state.
- Adapter calls provider API; on a 2xx accepted response the message is marked sent and an event emitted.
- Provider delivers email; provider sends webhooks for delivered/bounced/opened/clicked.
- Webhook handler verifies signature, updates StepExecution/Message status and emits analytics events.
Provider comparison
| Feature / Concern | Mailgun | AWS SES |
|---|---|---|
| API type | HTTP REST, webhooks | AWS SDK (SendRawEmail), SMTP, SNS for events |
| Authentication | API key (secrets store) | IAM roles / keys (preferred IAM role) |
| Webhooks / Events | Native webhook endpoints (delivered/bounce/open/click/complaint) | SNS topics for bounces/complaints/deliveries; Pinpoint for opens/clicks |
| Delivery reliability | High; provider-managed pools | High; AWS-managed with strong SLA |
| Rate limiting & quotas | Provider limits; per-domain quotas | Account-level quotas; scalable with request to AWS |
| Cost model | Per-message + add-ons | Per-message + regional pricing; often cheaper at scale |
| Regional availability | Global endpoints | Region-specific (choose region for SES) |
| Setup complexity | Simple DNS for domain + webhooks | Verify domain + IAM + SNS wiring (slightly more ops) |
| Tracking (opens/clicks) | Built-in | Requires Pinpoint or custom tracking |
| Bounce / complaint handling | Webhooks & suppression lists | SNS notifications; integrate with suppression logic |
| Best for | Quick setup, advanced email features, developer ergonomics | Deep AWS integration, cost at scale, IAM security |
| Sandbox/testing | Mailgun sandbox domains | SES sandbox mode (limited recipients) |
Provider-specific details
Mailgun
- API: HTTP REST (POST to https://api.mailgun.net/v3/[domain]/messages or batch endpoints).
- Authentication: API key (stored in secrets manager). Use per-domain keys or a single key scoped to the sending domain.
- Capabilities: open/click tracking, tagging, templates (Mailgun templates can be used but rendering is usually done in-app), batch sending via recipient-variables.
- Best practices:
- Use a verified domain with correct SPF, DKIM, and DMARC records.
- Use Mailgun webhooks for delivered, dropped, bounced, opened, clicked, complained.
- Respect suppression lists; do not re-send to addresses in Mailgun suppression without a deliberate flow.
- Use message headers: X-Mailgun-Variables, X-Mailgun-Tag for analytics.
- Response handling:
- 200/202 -> accepted. Treat as success and persist provider message id for later correlation.
- 4xx -> client errors (bad request, validation): treat as non-retriable unless transient (e.g., rate-limit headers suggest retry-after).
- 5xx or 429 -> transient: retry with exponential backoff. Observe
Retry-Afterheader if present.
- Webhook security: validate Mailgun signature (timestamp + token + signature) and timestamp window.
Key notes for Mailgun-only:
- Use Mailgun webhook endpoint protected by signature verification and a narrow ingress rule (IP or firewall) if possible.
- Persist Mailgun message-id returned on send for later correlation with webhook events.
- Respect Mailgun suppression lists; optionally sync suppression state into your DB.
- Monitor API error rates and use a circuit-breaker to avoid cascading failures; without fallback the circuit-breaker should trigger alerts and optionally pause campaigns.
AWS SES
- API: SMTP or AWS SDK (SendRawEmail / SendTemplatedEmail). Prefer AWS SDK with IAM roles for servers in AWS.
- Authentication: IAM roles (preferred for ECS/EKS tasks) or IAM user keys stored securely.
- Capabilities: reputation metrics via SES, sending statistics, event publishing via SNS (deliveries, bounces, complaints, opens/clicks if using open tracking via Amazon Pinpoint or custom tracking).
- Best practices:
- Verify sending identities (domains or emails) and set up SPF/DKIM.
- Use dedicated configuration sets to capture events and route them to SNS topics.
- Use SNS -> HTTP(S) endpoint or SQS to receive SES events reliably.
- For large volume, request production access and increase sending quotas.
- Response handling:
- AWS SDK returns message-id; treat as accepted on success.
- Throttling (ThrottlingException) -> retry with exponential backoff and jitter.
- Hard bounces/complaints are delivered via SNS notifications.
- Security: use least-privilege IAM policies (ses:SendRawEmail, sns:Publish if needed). Rotate credentials regularly.
Key notes for SES-only:
- Configure SES configuration sets and route events to SNS topics for reliable delivery of bounces/complaints/deliveries.
- Use IAM roles for ECS tasks so the adapter can call SES without embedding long-lived credentials.
- Use SQS as a durable buffer between SNS and your webhook handler to ensure reliable processing and replay.
- Monitor SES sending quotas and set alarms for ThrottlingException patterns; plan for auto-scaling workers if necessary.
Main considerations — Mailgun
- Quick onboarding and developer-friendly REST API.
- Strong built-in tracking and tagging features; convenient webhooks for event capture.
- Use when you need fast setup, rich per-message metadata, and a provider-managed deliverability offering.
- Operationally: ensure suppression lists are respected and signatures are validated. Monitor request quotas and set circuit-breakers for 5xx/429 patterns.
Main considerations — AWS SES
- Best when you want to stay inside AWS ecosystem and leverage IAM, SNS, and CloudWatch directly.
- Typically lower cost at scale and integrates well with other AWS services (SNS, SQS, CloudWatch, Kinesis).
- Use when you need predictable scaling and strong operational control; plan for SNS wiring and domain verification steps.
Single-provider flows (no runtime selection)
If your deployment uses a single provider (for example you commit to Mailgun only or SES only), the runtime flow is simpler because provider-selection and fallback logic are removed. Below are two focused flows and the small operational differences you should consider.
Operational differences summary
- Webhook delivery: Mailgun posts directly to your HTTP webhook; SES pushes events to SNS (recommended) which can fan out to SQS/HTTP. SES + SNS + SQS is more resilient at the cost of extra setup.
- Authentication: Mailgun uses API keys and webhook signatures; SES uses IAM (preferred) and SNS message verification.
- Tracking: Mailgun provides easier built-in open/click tracking; SES often requires Pinpoint or custom tracking redirects.
- Failover: With a single provider there is no automatic failover in-app — instead rely on retries, alerting and manual remediation.