Wednesday, June 23, 2:30pm, 2405 Siebel Center

“Technical and organizational challenges and solutions in Yahoo!’s massive data warehousing environment”

Yahoo’s Data Quality Team

Data is Yahoo!’s most strategic asset, and high-quality data is a key to accurate insights and monetization.  This presentation describes the approach to ensure high data quality by applying recognized industry practices with a customized approach in a massive data environment.  After significant data quality wins in Yahoo! data systems, three success factors are important in addressing the next level DQ issues: (1) The methodology that builds in proactive and reactive capabilities into products up-front and includes end-to-end data focus resulting in system improvements and fast issue resolution, (2) the organization of the DQ program that uses a central and embedded-in-the-businesses model with a strong focus on customer engagement, and (3) and solutions for technical challenges in the internet domain in Yahoo!’s massive data environment including statistical monitoring and alerting, abuse and robot traffic detection, and latency vs. accuracy.