How to Evaluate Data Quality Before You Buy
Data quality problems rarely show up in a vendor’s sales deck — they show up three months after purchase, when a marketing campaign bounces half its emails or an analysis produces numbers that don’t match reality. Evaluating quality properly before you buy is one of the highest-leverage steps in any data purchase, and it’s also one of the most skipped, usually because teams don’t have a concrete framework for what to check. This guide gives you one.
The concrete dimensions of data quality
“Good data” is vague enough to mean whatever a vendor wants it to mean. Break it down into dimensions you can actually test:
- Accuracy — does the data correctly reflect reality? An email that resolves to the wrong person, or a company address that’s years out of date, is an accuracy failure even if the field is populated.
- Completeness — what percentage of records have a value in the fields you actually need? Overall completeness scores can hide the fact that your specific priority field is sparsely populated.
- Freshness — how recently was each record verified or updated? A dataset can be internally consistent and well-structured while still being stale.
- Consistency — are values formatted uniformly across the dataset (e.g., consistent date formats, standardized country names, consistent categorical labels)? Inconsistency creates downstream matching and deduplication problems even when the underlying data is correct.
- Provenance — can the vendor explain where each piece of data came from and how it was collected or verified? Data without a traceable source is harder to trust and harder to defend if its accuracy is ever questioned.
- Documentation — does the vendor provide a data dictionary, field definitions, and update logs, or do you have to reverse-engineer the schema yourself?
Treat these as a checklist, not a vague impression. A vendor that can speak fluently to all six, with evidence rather than assurances, is meaningfully more trustworthy than one that can’t.
How to request and evaluate a sample
A sample is the single most useful tool you have before committing to a purchase, but only if you request and test it properly.
- Ask for a sample that mirrors your real use case — same geography, same fields, similar volume proportion — not a generic showcase sample the vendor hands out to everyone.
- Cross-check against known-good records. Pull fifty to a hundred records from your own systems that you’re confident are accurate, and compare the vendor’s data for those same entities. This single test often reveals more than any accuracy statistic the vendor publishes.
- Calculate completeness per field, not overall. If your use case depends specifically on verified phone numbers or a particular company attribute, measure completeness for that field alone.
- Check timestamp or “last verified” metadata, if provided, to gauge real freshness rather than trusting a general “updated regularly” claim.
- Look for formatting consistency across the sample — inconsistent date formats, inconsistent categorical values, or duplicate records under slightly different names are all signs of a weaker underlying pipeline.
If a vendor won’t provide a sample before purchase, treat that as a significant red flag on its own.
Red flags in vendor quality claims
- Unverifiable superlatives — “99% accurate” or “industry-leading data” without any explanation of how that number was calculated or what it was measured against.
- No description of collection methodology — a vendor that can’t explain in plain language how the data was gathered (public web collection, licensed panels, government sources, user-submitted data) is harder to trust and harder to defend to your own stakeholders later.
- No refresh cadence disclosed — if you can’t get a clear answer on how often records are updated, assume the data may be stale.
- Inconsistent answers between sales and support — if the sales team’s claims about accuracy or coverage don’t match what a support or solutions engineer tells you, that’s worth investigating further before signing.
Practical validation techniques
Beyond a basic sample check, a few additional techniques help validate quality at scale:
- Cross-reference against a second, independent source for a subset of records, especially for high-stakes fields like verified contact details or financial figures.
- Spot-check outliers. Sort the sample by unusual values (extremely old timestamps, missing country codes, improbable values) and manually review a handful — outliers often expose systemic issues that a random sample might miss.
- Test the matching/deduplication logic if the data will be joined against your own records — check how well identifiers (company names, domains, IDs) actually match without heavy manual cleanup.
- Run a small pilot in production before scaling usage, if your contract structure allows it, so quality issues surface before you’re fully committed.
Monitoring quality after purchase
Quality evaluation doesn’t end at purchase — most data quality degrades over time as the real world changes:
- Set a recurring schedule (quarterly is common) to re-sample and re-check accuracy against known records, especially for people and company data, which decay quickly as jobs and organizational structures change.
- Track bounce rates, match rates, or error rates in production as an ongoing quality signal, not just at onboarding.
- Ask your vendor about their own update cadence and whether it has changed since your initial evaluation — vendors sometimes reduce refresh frequency for lower-tier plans without making it obvious.
- Keep a record of quality issues you encounter and raise them with the vendor; a responsive vendor that fixes flagged issues is a good sign for the relationship long-term, while a vendor that’s unresponsive to specific, documented complaints is a warning sign for renewal.
Where to go next
If you’re evaluating web-collected data specifically, our web data platforms category covers how providers like Bright Data and Oxylabs document their collection methodology. If you’re focused on company or contact enrichment, People Data Labs is a useful reference point for how API-first providers structure verification. The use cases for enriching company data and scraping public web data go into more detail on quality expectations specific to those workflows.
Frequently asked questions
What's the fastest way to test a vendor's data quality?
Request a sample sized closely to your real use case, then check it against a set of records you already know to be correct. Comparing against known-good records surfaces accuracy and completeness problems far faster than reading a vendor's stated quality metrics.
What data quality red flags should make me walk away from a vendor?
Refusal to provide a sample before purchase, vague or unverifiable accuracy claims, no documentation of collection or verification methodology, and no clear answer about how often the data is refreshed are all strong warning signs.
Does data quality stay the same after I buy it?
No. Most data decays over time — people change jobs, companies merge, prices update, listings expire. Plan for ongoing monitoring and periodic re-validation rather than treating a purchase as a one-time quality check.
How do I check completeness in a sample dataset?
Calculate the percentage of records with a non-null value for each field you actually need, not just an overall completeness score. A dataset can look 95% complete overall while the one field you care about most is only 60% populated.