Nitish John Toppo

Dec 01 2025|9 min read

Client Background

Our client, a leading investment management firm, faced significant inefficiencies in their client due diligence (DDQ) process. Their workflows were heavily manual, time-consuming, and error-prone, driven by unstructured document formats, repetitive Q&A patterns, and inconsistent documentation standards.

Problem Statement

Enterprises handling client due diligence faces time-consuming, error-prone processes driven by unstructured formats, repeated question-answer patterns, and inconsistent documentation. Challenges such as extracting questions from unstructured PDFs, matching new or reformulated questions to prior answers, and completing DDQs accurately and quickly create inefficiencies. Moreover, using generative AI in this sensitive domain raises ethical concerns, including hallucination risks, unverifiable responses, and compliance misalignment.

Project Objectives

Develop a solution to automate DDQ answering using historical Q&A data and LLM-based document understanding.
Ensure ethical AI practices, including hallucination filtering, confidence scoring, and policy-aligned controls.
Deliver high-quality, contextually accurate autofill suggestions that reduce manual effort and turnaround time.
Improve scalability by supporting DDQ completion across multiple clients with repeatable and reliable automation.

Scope of Work

** Data Extraction:** Parse structured and unstructured client DDQ documents to extract questions, answers, and metadata.
Semantic Retrieval: Generate embeddings for historical Q&A data and perform similarity matching to identify best-match responses.
Answer Completion: Automatically fill in blank DDQ fields using verified answers or contextually relevant LLM-generated suggestions.
Guardrails and Controls: Implement hallucination filtering, confidence scoring, audit trails, and red-flag alerts for ambiguous matches to ensure ethical, policy-aligned automation.
Scalable Framework: Enable secure, fast, and repeatable DDQ processing across diverse clients with minimal manual intervention.

Approach Followed

Requirement Analysis: Engage stakeholders to identify typical DDQ formats, sources of historical answers, compliance constraints, and workflow expectations.
System Design: Architect fallback flows, escalation triggers, and human-in-the-loop checkpoints to ensure accuracy and trust.
Knowledge Integration: Index historical Q&A pairs using vector databases to enable efficient semantic retrieval.
Prompt Engineering: Develop robust prompts that construct answers based on a matched historical context, minimizing hallucination risks.
Pilot Deployment: Test solution performance on real-world DDQs, iterating based on user feedback to fine-tune answer accuracy, confidence thresholds, and audit-ability.
Enterprise Rollout: Deliver training materials, onboard teams, and monitor solution adoption and continuous improvement cycles.

Tech Stack

Layer	Tools / Technologies
Document Parsing	PDF parsers, OCR tools
NLP & LLM	AWS Bedrock, LangChain
Semantic Storage and Retrieval	DocumentDB, cosine similarity
Embedding Models	Amazon Titan Embedding v2
Backend & APIs	FastAPI / Flask
Deployment	Docker, Kubernetes (Azure AKS / AWS EKS), CI/CD Pipelines
Authentication & AuthZ	Azure AD, OAuth2, JWT

Explore by tags

Case Studies you may like

There are no more case studies for this cateory.