← Essays
AI in Fintech

RAG for Merchant Integration Support: A Production Playbook

May 13, 2026·9 min read·By Rizwan Zafar

Merchant integration support is the cleanest place to put a GenAI bot in a payments platform. The questions are patterned, the answers live in docs you already maintain, and the failure mode (wrong answer) has a cheap recovery (human takes over).

We shipped this surface at Simpaisa and cut merchant integration support time by ~65%. This is the playbook.

Why RAG, not fine-tuning

Two reasons. First, your integration docs change every sprint, new endpoints, new error codes, new SDKs. Fine-tuned models go stale fast and re-fine-tuning is slow and expensive. RAG just re-indexes.

Second, you need citations. Every answer the bot gives a merchant should be traceable to a doc page. Without citations you cannot defend the answer in a post-incident review.

The corpus is the product

Spend more time on the corpus than on the model. Three principles:

  • Curate, don't dump. Index the docs you actually want the bot to use. If your old v1 docs are still on the site, exclude them. The retrieval layer cannot tell stale from fresh.
  • Structure the error catalogue. Decline codes, integration error codes and webhook failure modes should be in a single canonical store with: code, plain-English meaning, common causes, suggested fix, links to relevant doc sections.
  • Add the integration playbook. A merchant going from zero to first successful transaction follows the same 8–12 steps. Make that an explicit document, not implicit across 40 pages.

Citation discipline is non-negotiable

Every answer must show the doc page(s) it came from. Two enforcement points:

  1. Prompt the LLM to refuse if it cannot ground the answer in retrieved context.
  2. Post-process the response: parse out citation references; if zero, drop the answer and hand off to human.

This costs you 5–10% of answer volume in the early weeks. It saves you orders of magnitude more in trust.

Fallback paths designed before launch

Three fallback paths to design before you ship:

  • Low-confidence handoff. If retrieval similarity is low or the LLM signals uncertainty, route to a human with the question + retrieved context attached.
  • Out-of-scope handoff. Questions about billing, contracts, dispute outcomes or anything money-affecting. The bot says "I'll route this" and does.
  • Explicit human request. Merchant types "human" or "rep", instant handoff, no friction.

If you don't design these paths, the bot will improvise, and that's where damage happens.

Operating model: who owns the bot?

Three teams co-own it: DevRel/Docs (corpus quality), product engineering (retrieval + LLM ops), and partner success (fallback handling + feedback loop). Weekly review of: top questions, lowest-confidence answers, fallback rate trend, citation accuracy spot-check.

Common failure modes

  • Stale doc detected too late. The bot keeps answering with an old endpoint that's been deprecated. Mitigation: a quarterly doc audit + tagging deprecated content explicitly.
  • Code-block hallucinations. LLMs love to invent SDK method names. Mitigation: any code block in an answer must be matched against the actual SDK API surface; if no match, strip and hand off.
  • Overconfidence on currency conversions or settlement timing. Add specific guardrails for these topics, never let the bot quote money.

What good looks like at 6 months

  • 80%+ first-turn resolution on patterned questions
  • <2% factual error rate (sample-audited weekly)
  • Average time-to-first-successful-API-call for new merchants halved
  • Doc team gets a feed of the "bot said X, human corrected to Y" pairs and uses it to improve the docs

The compound effect, better bot → better docs → better bot, is the actual unlock.

FAQ

Do I need a vector DB? Yes. Start with whatever is in your stack, pgvector, Pinecone, Weaviate. The DB choice matters far less than corpus quality.

Closed-model APIs or open-source LLMs? Either works. For merchant-facing surfaces with sensitive data, open-source self-hosted has the edge on data egress. For internal-only tooling, vendor APIs are faster to ship.

How do you measure quality? Weekly: first-turn resolution rate, fallback rate, citation accuracy on a sampled audit. Monthly: merchant CSAT delta vs. control. Quarterly: drift / regression audit on a held-out question set.

What's the biggest mistake teams make? Treating the corpus as a one-time setup. The corpus is the product. Treat it like one.

Tags
RAGGenAImerchant supportAI in paymentsLLMintegrationdeveloper experience