Portal Automation
Automated invoice processing with OCR, Azure authentication and SOAP integration
Overview
End-to-end invoice processing automation for a real estate corporation. PDF invoices are automatically retrieved from Azure SharePoint, parsed via OCR, validated, and transferred to the contractor portal through a SOAP interface. Authentication via Azure Entra ID with OAuth2 Client Credentials and ChainedTokenCredential fallback. Including service catalog mapping, error email notifications, and comprehensive processing log with JSONB storage of raw OCR data.
Independent conception and full-stack development from requirements analysis and customer contact through UI design to deployment. Hexagonal architecture with clear separation between domain logic, API layer, and backend integrations (Azure, SOAP, OCR, PostgreSQL). Schedule-based polling mechanism checks the SharePoint folder for new invoices every minute. Failed invoices are automatically moved to an error folder and escalated via email.
Tech Stack
Architecture Highlights
- Domain-Driven Design with repository interfaces in the domain and technology-specific adapters (Azure, SOAP, OCR, PostgreSQL)
- Schedule-based polling mechanism with automatic error routing
- WS-Security with XML signature, timestamp, and Java KeyStore for SOAP communication
- ChainedTokenCredential for resilient Azure authentication (username/password → client secret fallback)
- Flyway migrations with PostgreSQL JSONB for flexible OCR data storage
Key Features
- OCR recognition of PDF invoices with Tesseract (German language support)
- Azure Entra ID authentication with OAuth2 and ChainedTokenCredential fallback
- SharePoint integration via Microsoft Graph API (read, move, folder management)
- SOAP integration with contractor portal using WS-Security and mutual certificate authentication
- Service catalog mapping for automatic invoice position assignment
- Comprehensive processing log with JSONB storage of raw OCR data
- Email escalation on processing errors