Vendor Data Enrichment Agent — 85% Reduction in Manual Data Cleanup

An autonomous agent that matched 8,000+ vendor records against Google Places with confidence scoring, enriching missing data and correcting inaccuracies — reducing manual data cleanup effort by 85%.

The Challenge

Point-of-sale platforms depend on accurate vendor data. Every restaurant, cafe, and bar on the platform needs correct addresses, phone numbers, trading hours, and operational status to ensure smooth ordering, delivery logistics, and customer-facing search results. When vendor records are incomplete or inaccurate, the downstream effects compound — failed deliveries, customer complaints, and erosion of platform trust.

This client's vendor database had grown organically over several years, absorbing data from multiple onboarding channels with inconsistent validation. Across approximately 8,000 active vendor records, a significant proportion had missing fields — no phone number, incomplete addresses, or outdated business names. Worse, some records contained data that was technically present but factually wrong: transposed digits in phone numbers, old trading names, or addresses that pointed to previous locations. The platform had no systematic way to distinguish between records that were correct, records that were incomplete, and records that were actively misleading.

The existing approach was manual: a data operations team would periodically audit vendor records, cross-referencing against Google Maps, business directories, and direct phone calls. This process was slow, expensive, and could never keep pace with the rate at which vendor data decayed — businesses move, change phone numbers, rebrand, or close. The client needed a solution that could continuously validate and enrich their entire vendor database without scaling headcount.

Our Approach

1. Confidence-Scored Google Places Matching

The core of the solution was an intelligent matching engine that took existing vendor data — business name, approximate location, and any available contact details — and attempted to resolve each record against the Google Places API. Rather than treating this as a simple lookup, the agent computed a confidence score for each match based on multiple signals: name similarity (accounting for abbreviations, trading name variations, and franchise formatting), geographic proximity, phone number overlap, and category alignment.

Matches scoring 90% or above were classified as high-confidence and processed automatically. Matches below 90% were flagged for human review, with the agent presenting the candidate match alongside the specific factors that reduced confidence — giving reviewers the context to make a fast decision rather than repeating the entire research process manually.

2. Automated Data Enrichment Pipeline

Once a Google Place ID was confirmed (either automatically or via human approval), the agent extracted the full set of available fields from the Places API response and compared them against the existing vendor record. The enrichment pipeline operated on a field-by-field basis:

Missing fields were populated directly from the Places data (address components, phone number, website, ratings, business status).
Conflicting fields were flagged with both values presented — the agent applied heuristic rules (e.g., preferring the more recently updated source) but deferred to human review for ambiguous cases.
Confirmed fields were validated and timestamped, creating an audit trail of when each data point was last verified.

This approach ensured that enrichment was additive and non-destructive — existing correct data was preserved, gaps were filled, and conflicts were surfaced rather than silently overwritten.

3. Agent Architecture with ADK

The agent was built on Google's Agent Development Kit (ADK) running on Vertex AI, which provided the orchestration layer for managing the multi-step enrichment workflow. The ADK framework handled tool selection, retry logic, and state management across the pipeline — from initial record retrieval through Places API lookup, confidence scoring, enrichment, and result persistence.

Each vendor record was processed as an independent task, enabling parallel execution across the dataset. The agent maintained a processing ledger that tracked which records had been enriched, which were pending human review, and which had failed matching entirely — providing the operations team with a real-time dashboard of data quality across the entire vendor base.

4. Human-in-the-Loop Review Workflow

Records flagged for human review were queued with full context: the original vendor data, the candidate Google Places match, the computed confidence score, and a breakdown of which factors contributed to uncertainty. This reduced the average review time from several minutes of independent research to a rapid approve/reject decision — typically under 30 seconds per record.

Results

Metric	Value
Manual Effort Reduction	85% less time spent on vendor data cleanup
Records Processed	~8,000 active vendor records
Auto-Match Rate	High-confidence matches (≥90%) processed without human intervention
Data Coverage	Missing fields populated across address, phone, ratings, and business status
Review Efficiency	Human review time reduced to ~30 seconds per flagged record

The 85% reduction in manual effort freed the data operations team to focus on strategic vendor relationship management rather than routine data hygiene. The continuous enrichment pipeline also established a foundation for ongoing data quality — rather than periodic manual audits, vendor records are now validated and refreshed on a rolling basis.

Technical Architecture

Component	Technology
Agent Framework	Google Agent Development Kit (ADK)
Cloud Platform	Google Cloud — Vertex AI
Data Source	Google Places API (Place ID resolution + field extraction)
Matching Engine	Confidence-scored fuzzy matching (name, location, phone, category)
Confidence Threshold	90% — auto-approve above, human review below
Enrichment Strategy	Field-by-field: populate missing, flag conflicts, validate existing
Human Review	Contextual queue with candidate comparison and score breakdown
Processing Model	Parallel task execution with processing ledger

Key Takeaways

Confidence scoring transforms automation boundaries. A binary match/no-match approach would have required either excessive human review or unacceptable error rates. The 90% confidence threshold gave the client a tunable dial between automation speed and data accuracy — one they could adjust as trust in the system grew.
Enrichment must be non-destructive. Silently overwriting vendor data with API results would have introduced new errors wherever the Places API was itself outdated. The field-by-field comparison with conflict flagging ensured that enrichment added value without destroying institutional knowledge embedded in the existing records.
Agent architecture enables continuous data quality. Unlike a one-off migration script, the ADK-based agent can be re-run as vendor data decays — new businesses, changed phone numbers, and closed locations are caught on subsequent passes rather than accumulating silently.
Human-in-the-loop is a feature, not a limitation. The review workflow was designed to make human judgment fast and well-informed, not to eliminate it. The result was a system the operations team trusted — which is ultimately what determines whether an automation tool gets adopted or abandoned.