Yesp StudioYesp Studio
Case study
BankingCloud TransformationAWS

Confidential · SME Fintech

A unified data fabric, predictable automated releases, and a high-performance foundation the team can scale on for the next 3 years without overhead spikes.

Engagement at a glance
Timeline
5 months
Team
1 architect · 3 engineers · 1 PM
Industry
Banking
ROI
40% lower infra cost · 9-month payback.
Engagement summary

Fragmented systems, brittle manual deployments, and severe database lockups were crippling operations. The legacy stack consisted of three tightly-coupled services running on end-of-life virtualized hardware. Every deployment required a 4-hour midnight maintenance window, often followed by critical hotfixes. Database lock contention during peak SME trading volumes frequently resulted in API transaction times escalating from 200ms to over 8.5 seconds, triggering cascaded failures in consumer-facing billing and ledger modules.

Engagement overview

Context, scope, and success criteria

The client was scaling quickly, but core product and operations were constrained by fragmented legacy systems, release instability, and database transaction bottlenecks that caused severe customer churn.

Project snapshot
Timeline
5 months
Team
1 architect · 3 engineers · 1 PM
Industry
Banking
Primary stack
AWS cloud native
Objectives
  • Decompose fragile monolithic billing, core ledger, and API router modules into highly scalable, containerized microservices.
  • Eliminate transaction failure rates and database lock contention to ensure a transaction API SLA of sub-250ms under peak load.
  • Incorporate an automated AI-driven triage layer to categorize billing disputes, routing high-priority anomalies to fraud specialists.
  • Automate CI/CD pipelines to allow zero-downtime progressive blue-green deployments, moving from monthly manual releases to multiple automated daily deploys.
Challenge

Why the work started

Fragmented systems, brittle manual deployments, and severe database lockups were crippling operations. The legacy stack consisted of three tightly-coupled services running on end-of-life virtualized hardware. Every deployment required a 4-hour midnight maintenance window, often followed by critical hotfixes. Database lock contention during peak SME trading volumes frequently resulted in API transaction times escalating from 200ms to over 8.5 seconds, triggering cascaded failures in consumer-facing billing and ledger modules.

Solution

What we built

We completely rebuilt the core platform as an event-driven, cloud-native architecture on AWS. We decomposed the monolith using the Strangler-Fig pattern, moving critical operations to AWS ECS on AWS Fargate. We introduced Amazon EventBridge for event propagation, Amazon RDS for PostgreSQL with Amazon ElastiCache for Redis to eliminate database bottlenecks, and built a custom AI triage agent utilizing Amazon Bedrock (Claude 3.5 Sonnet) to instantly categorize and route complex transaction disputes. The platform was secured with AWS KMS encryption, IAM boundary roles, and integrated into a robust CI/CD pipeline via GitHub Actions, supported by comprehensive Datadog and AWS CloudWatch observability.

40%
Infra cost reduction
9 mo
Payback period
Deploy frequency
Architecture

Legacy versus modern flow

Platform structure
Legacy state

Three tightly-coupled applications sharing a single monolithic relational database with no connection pooling. Deployment required fully stopping servers, and a single ledger failure crashed the entire platform.

Modern stack

An event-driven microservices architecture on AWS ECS Fargate, coordinated asynchronously via Amazon EventBridge. API requests are handled by AWS API Gateway and routed with low latency. Data is partitioned across Amazon RDS PostgreSQL databases with ElastiCache Redis for caching, while an AI triage queue powered by Bedrock handles anomalies in real time.

Engineering

Playbook excerpt

Technical spec
import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";

const client = new BedrockRuntimeClient({ region: "us-east-1" });

export async function triageDispute(disputeText: string) {
  const prompt = `
  You are an SRE Fraud and Ledger anomaly triage agent. 
  Categorize the following dispute and extract the priority (HIGH/MEDIUM/LOW).
  
  Dispute: "${disputeText}"
  
  Output JSON format exactly: { "category": string, "priority": "HIGH"|"MEDIUM"|"LOW", "confidence": number }
  `;

  const response = await client.send(new InvokeModelCommand({
    modelId: "anthropic.claude-3-5-sonnet-v1:0",
    contentType: "application/json",
    body: JSON.stringify({
      max_tokens: 150,
      prompt: prompt,
      temperature: 0.1
    })
  }));
  
  return JSON.parse(new TextDecoder().decode(response.body));
}
Figure 2.1: Asynchronous Bedrock wrapper routing dispute text into high-confidence classifications.
Execution

Migration timeline

The engagement ran as a phased migration. Each stage below can be expanded to inspect the delivery shape.

Delivery

How the work was executed

01

Completed architecture decomposition and transaction dependency mapping in a rigorous 2-week discovery sprint.

02

Migrated high-frequency ledger and billing services in phases using a strangler pattern with AWS API Gateway traffic proxying.

03

Implemented real-time data replication using AWS Database Migration Service (DMS), maintaining dual-write synchronization for 10 days before final cut-over.

04

Engineered the Bedrock LLM dispute classification engine, reducing customer ticket resolution loops from 48 hours to less than 15 minutes.

05

Deployed an automated CI/CD pipeline with terraformed environments, blue-green deployment rules, Datadog observability, and SLO tracking.

Governance

Controls and delivery rhythm

Delivery was run through weekly architecture and risk reviews with shared KPI tracking for product, engineering, and operations.

The client achieved predictable release operations, lower run-cost, zero transactional database locking under high load, and a solid foundation to support multi-year growth plans.

Outcome

Results and next steps

Business outcome
40% lower infra cost · 9-month payback.

A unified data fabric, predictable automated releases, and a high-performance foundation the team can scale on for the next 3 years without overhead spikes.

Next phase

Phase two focuses on deeper revenue analytics and expansion of AI-assisted operations across customer and risk teams.

Have similar architecture bottlenecks?

We can map the same modernization pattern to your infrastructure, release process, and operating model.