Event Processing Architecture
How Doone Flow processes webhook events and workflows at scale using message queues and asynchronous workers
Overview
Doone Flow uses an asynchronous, queue-based architecture to process webhook events and workflows reliably at scale.
When a webhook arrives, the system immediately accepts it and queues it for processing. This means your API responds instantly while workflow execution happens in the background. This design provides several important benefits:
High Throughput
Webhooks are accepted instantly without waiting for execution to complete
Reliability
Failed executions are automatically retried with exponential backoff
Scalability
Multiple worker instances can process events in parallel
Resilience
Messages are persisted in the queue, preventing data loss during outages
Architecture Flow
Webhook Request
Incoming event received
Validate Payload
Check schema & security
200 OK
Event accepted
Message Queue
Events stored until processed
- • Long polling
- • Retry support
- • Delay scheduling
Dead Letter Queue
Failed after max retries
Worker Process
Background processing
- • Poll queue
- • Normalize data
- • Prepare workflow context
- • Execute workflow
Workflow Execution
Run actions sequentially
- • SMS messages
- • CRM updates
- • Webhook calls
- • AI processing
Database
Store execution records
Notifications
Email & Slack (on failure)
This diagram shows the complete flow from webhook ingestion through workflow execution, including retry mechanisms and error handling.
Key Components
1. Webhook Handler
The webhook handler is the entry point for all incoming webhook requests. It performs several critical functions:
- Authentication: Validates webhook security keys if configured
- Payload Validation: Validates incoming payloads against the configured JSON schema (if a trigger schema is defined)
- Organization Status: Checks if the organization is enabled before processing
- Queue Dispatch: Immediately sends the event to the internal message queue and returns a 200 OK response
The handler does not execute workflows directly. This separation ensures the API remains responsive even when workflows take time to process.
2. Message Queue
A durable message queue acts as the broker between the webhook handler and worker processes. Key features:
- Message Persistence: Events are stored in the queue until processed, preventing data loss
- Visibility Timeout: Messages are hidden while being processed, preventing duplicate processing
- Long Polling: Workers use long polling to reduce empty responses and improve efficiency
- Delay Support: Messages can be delayed for scheduled actions
- Receive Count Tracking: Tracks how many times a message has been received, enabling retry limits
The queue configuration includes a Dead Letter Queue (DLQ) that receives messages that fail after maximum retry attempts or are identified as permanent errors.
3. Worker Process
The worker process is a long-running background service that continuously polls the queue and processes events. The worker handles:
- Message Reception: Polls the message queue with long polling to receive events
- Retry Management: Checks receive count and handles messages that exceed retry limits
- Event Normalization: Normalizes trigger data using either schema-based or legacy normalization
- Workflow Execution: Executes workflow actions sequentially
- Error Handling: Distinguishes between transient and permanent errors
- Delayed Events: Processes delayed events from the database for longer delays
Multiple worker instances can run in parallel, allowing horizontal scaling to handle increased load.
4. Workflow Orchestrator
The workflow orchestrator coordinates the execution of workflow actions:
- Action Execution: Executes actions in sequence (SMS, CRM updates, webhooks, AI steps, etc.)
- Delay Handling: For delay actions, re-queues the event with a delay or stores in database for longer delays
- Retry Logic: Implements exponential backoff for failed actions with automatic retry limits
- Execution Tracking: Creates and updates workflow execution records in the database
- Failure Notifications: Sends email and Slack notifications when workflows fail (if configured)
Detailed Processing Flow
Step 1: Webhook Reception
- Webhook request arrives at
/webhook/{path} - Handler validates security key (if configured)
- Handler validates payload against trigger schema (if configured)
- Handler checks organization status (must be enabled)
- Event is serialized and sent to the internal message queue
- API immediately returns
{"status": "accepted"}
Step 2: Queue Processing
- Worker polls the queue with long polling
- If message received, worker checks how many times it has already been attempted
- If message exceeds maximum retry attempts, worker creates a failed execution record and deletes the message
- Otherwise, worker processes the event
Step 3: Event Normalization
- Worker retrieves workflow and trigger configuration from database
- If trigger schema is configured:
- Normalizes payload using schema field mappings
- Merges original payload fields as fallback to ensure all required data is available
- If no schema (legacy):
- Uses normalization service to extract and structure data fields
- If normalization fails, creates failed execution record and deletes message
Step 4: Workflow Execution
- Worker creates workflow execution record in database
- Executes actions sequentially:
- Delay Actions: Re-queues event with delay or stores in database for longer delays
- SMS Actions: Sends SMS via Twilio
- CRM Actions: Updates Salesforce or other CRM systems
- Webhook Actions: Makes HTTP requests to external endpoints
- AI Actions: Processes AI enrichment, generation, or decision steps
- For each action failure:
- If within retry limit: Re-queues with exponential backoff
- If exceeds retry limit: Sends to DLQ and marks action as failed
Step 5: Completion
- If workflow completes successfully, updates execution status to "completed"
- If workflow fails, updates execution status to "failed" and sends failure notifications (if configured)
- Worker deletes message from the queue
- Execution record is available in the UI for monitoring and debugging
Error Handling & Retry Logic
Permanent vs. Transient Errors
The system distinguishes between permanent errors (that won't succeed on retry) and transient errors (temporary failures that may succeed on retry):
Permanent Errors
- Missing required fields in payload
- Invalid UUID or malformed data
- Invalid configuration or missing resources
These are immediately deleted from the queue and marked as failed.
Transient Errors
- Network timeouts
- Temporary service unavailability
- Rate limiting
- Database connection issues
These are retried with exponential backoff until the retry limit is reached.
Retry Strategy
- Action-Level Retries: Individual actions are automatically retried with exponential backoff until the retry limit is reached
- Message-Level Retries: The queue automatically retries messages that aren't deleted within the visibility timeout
- Max Receive Count: Messages that exceed the maximum receive count are considered failed and deleted
- Dead Letter Queue: Failed actions after max retries are sent to the DLQ for manual inspection
Scalability & Performance
Horizontal Scaling
Multiple worker instances can run simultaneously, each polling the same message queue. The queue automatically distributes messages across workers, enabling:
- Parallel processing of multiple workflows
- Increased throughput during peak loads
- Fault tolerance (if one worker fails, others continue)
Performance Optimizations
- Long Polling: Reduces empty responses and API calls
- Immediate API Response: Webhooks return instantly without waiting for execution
- Connection Pooling: Efficient database and integration connection management
- Asynchronous Processing: All workflow execution happens off the request path
Monitoring & Observability
All workflow executions are tracked in the database, providing comprehensive observability:
- Execution Records: Every workflow execution is stored with status, timestamps, and error messages
- Action Execution Tracking: Individual action executions are tracked with results and retry counts
- Raw Payload Storage: Original webhook payloads are stored for debugging and replay
- Trigger Data: Normalized trigger data is stored for reference
- Failed Execution Visibility: Failed executions appear in the UI for investigation
Best Practices
For Webhook Senders
- Always check the HTTP status code (200 = accepted, 400 = validation error, 500 = queue error)
- Implement retry logic for 500 errors (the webhook may be queued on retry)
- Keep payloads under 1MB (enforced limit)
- Use webhook security keys for production workflows
- Validate payload structure matches your trigger schema