How does the document processing pipeline work?
Documents go through a defined pipeline with multiple stages at amaise:
CREATED → OCR → SEGMENTATION → SPLITTING → INDEXING → EXTRACTION → ANALYSIS → ANSWERING → READY
Key features:
Idempotent workers: Each stage is handled by a standalone, stateless worker. Processing can be safely repeated in case of errors.
Asynchronous communication: Workers communicate via message queues (SQS). Each worker processes one task at a time.
Tenant separation: Each task is assigned to a specific tenant. The same tenant isolation controls apply as in the rest of the application.
Encrypted storage: Documents are stored in S3 with tenant-specific encryption keys.