Confidential · SME Fintech
A unified data fabric, predictable automated releases, and a high-performance foundation the team can scale on for the next 3 years without overhead spikes.
Fragmented systems, brittle manual deployments, and severe database lockups were crippling operations. The legacy stack consisted of three tightly-coupled services running on end-of-life virtualized hardware. Every deployment required a 4-hour midnight maintenance window, often followed by critical hotfixes. Database lock contention during peak SME trading volumes frequently resulted in API transaction times escalating from 200ms to over 8.5 seconds, triggering cascaded failures in consumer-facing billing and ledger modules.
Context, scope, and success criteria
The client was scaling quickly, but core product and operations were constrained by fragmented legacy systems, release instability, and database transaction bottlenecks that caused severe customer churn.
- Decompose fragile monolithic billing, core ledger, and API router modules into highly scalable, containerized microservices.
- Eliminate transaction failure rates and database lock contention to ensure a transaction API SLA of sub-250ms under peak load.
- Incorporate an automated AI-driven triage layer to categorize billing disputes, routing high-priority anomalies to fraud specialists.
- Automate CI/CD pipelines to allow zero-downtime progressive blue-green deployments, moving from monthly manual releases to multiple automated daily deploys.
Why the work started
Fragmented systems, brittle manual deployments, and severe database lockups were crippling operations. The legacy stack consisted of three tightly-coupled services running on end-of-life virtualized hardware. Every deployment required a 4-hour midnight maintenance window, often followed by critical hotfixes. Database lock contention during peak SME trading volumes frequently resulted in API transaction times escalating from 200ms to over 8.5 seconds, triggering cascaded failures in consumer-facing billing and ledger modules.
What we built
We completely rebuilt the core platform as an event-driven, cloud-native architecture on AWS. We decomposed the monolith using the Strangler-Fig pattern, moving critical operations to AWS ECS on AWS Fargate. We introduced Amazon EventBridge for event propagation, Amazon RDS for PostgreSQL with Amazon ElastiCache for Redis to eliminate database bottlenecks, and built a custom AI triage agent utilizing Amazon Bedrock (Claude 3.5 Sonnet) to instantly categorize and route complex transaction disputes. The platform was secured with AWS KMS encryption, IAM boundary roles, and integrated into a robust CI/CD pipeline via GitHub Actions, supported by comprehensive Datadog and AWS CloudWatch observability.
Legacy versus modern flow
Three tightly-coupled applications sharing a single monolithic relational database with no connection pooling. Deployment required fully stopping servers, and a single ledger failure crashed the entire platform.
An event-driven microservices architecture on AWS ECS Fargate, coordinated asynchronously via Amazon EventBridge. API requests are handled by AWS API Gateway and routed with low latency. Data is partitioned across Amazon RDS PostgreSQL databases with ElastiCache Redis for caching, while an AI triage queue powered by Bedrock handles anomalies in real time.
Playbook excerpt
import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";
const client = new BedrockRuntimeClient({ region: "us-east-1" });
export async function triageDispute(disputeText: string) {
const prompt = `
You are an SRE Fraud and Ledger anomaly triage agent.
Categorize the following dispute and extract the priority (HIGH/MEDIUM/LOW).
Dispute: "${disputeText}"
Output JSON format exactly: { "category": string, "priority": "HIGH"|"MEDIUM"|"LOW", "confidence": number }
`;
const response = await client.send(new InvokeModelCommand({
modelId: "anthropic.claude-3-5-sonnet-v1:0",
contentType: "application/json",
body: JSON.stringify({
max_tokens: 150,
prompt: prompt,
temperature: 0.1
})
}));
return JSON.parse(new TextDecoder().decode(response.body));
}Migration timeline
The engagement ran as a phased migration. Each stage below can be expanded to inspect the delivery shape.
How the work was executed
Completed architecture decomposition and transaction dependency mapping in a rigorous 2-week discovery sprint.
Migrated high-frequency ledger and billing services in phases using a strangler pattern with AWS API Gateway traffic proxying.
Implemented real-time data replication using AWS Database Migration Service (DMS), maintaining dual-write synchronization for 10 days before final cut-over.
Engineered the Bedrock LLM dispute classification engine, reducing customer ticket resolution loops from 48 hours to less than 15 minutes.
Deployed an automated CI/CD pipeline with terraformed environments, blue-green deployment rules, Datadog observability, and SLO tracking.
Controls and delivery rhythm
Delivery was run through weekly architecture and risk reviews with shared KPI tracking for product, engineering, and operations.
The client achieved predictable release operations, lower run-cost, zero transactional database locking under high load, and a solid foundation to support multi-year growth plans.
Results and next steps
A unified data fabric, predictable automated releases, and a high-performance foundation the team can scale on for the next 3 years without overhead spikes.
Phase two focuses on deeper revenue analytics and expansion of AI-assisted operations across customer and risk teams.
Have similar architecture bottlenecks?
We can map the same modernization pattern to your infrastructure, release process, and operating model.
Yesp Studio