Something puzzles me about waterfall data enrichment.
Waterfall enrichment has become more popular these days because it addresses three market gaps:
Comprehensive coverage of your entire target market(s)
Enhanced data accuracy and quality
Cost optimization through a pay-by-results model
It actually feels like things are coming back full circle. I recall, 10+ years ago, having to subscribe to multiple data providers. It wasn't just for market coverage, but also to deal with a 'declining' return from providers over time —something I observed anecdotally but never really understood.
Let me explain. Every time we brought a new provider on board, we would get an influx of great data. However, over time, we would see less insight and find more inaccuracies. This degradation in data quality became a significant operational challenge. In response, the way to deal with it was to manage a portfolio of providers and regularly churn one to bring another on board.
Fast forward to today, I am hearing anecdotally again of this issue, making waterfall enrichment a way to source data from multiple providers.
The last element is cost control. Large datasets often require premium subscriptions with significant commitments, making the pay-per-use model of waterfall data enrichment particularly compelling.
Most waterfall enrichment providers use sequential logic, cycling through a predefined list of vendors until they find a match. Some may adjust the sequence based on your market and results. It's a bit hard to know what they really do since vendors tend to keep their secret sauce close to their chest.
The continued reliance on predefined sequences surprises me, particularly given the complexity of these data quality issues. While I understand that most providers may lack the scale to effectively apply statistical methods to vendor selection, I would expect some form of dynamic optimization.
I am also observing companies turning to solutions that let them connect to multiple sources and apply their own logic. This can be achieved through a diverse set of options from the likes of Cargo, Clay, Common Room, MadKudu, Keyplay, Openprise, and Unify.
I'd love to hear your experiences and insights on these key questions:
Are you able to work with a single data provider, or do you need to source your data from multiple ones?
Do you prefer to handle the multiple sources yourself, or are you turning to a provider to take this complexity off your plate?
Is the sequential lookup to the various data sources working for you, or do you need to apply a more sophisticated method to secure accurate data
Please share your thoughts and experiences!