Data Cleaning : A Practical Perspective


Venkatesh. Ganti
Bok Engelsk 2013 · Electronic books.
Annen tittel
Medvirkende
Utgitt
San Rafael : : Morgan & Claypool Publishers, , 2013.
Omfang
1 online resource (87 p.)
Opplysninger
Description based upon print version of record.. - Preface; Acknowledgments; Introduction; Enterprise Data Warehouse; Comparison Shopping Database; Data Cleaning Tasks; Record Matching; Schema Matching; Deduplication; Data Standardization; Data Profiling; Focus of this Book; Technological Approaches; Domain-Specific Verticals; Generic Platforms; Operator-based Approach; Generic Data Cleaning Operators; Similarity Join; Clustering; Parsing; Bibliography; Similarity Functions; Edit Distance; Jaccard Similarity; Cosine Similarity; Soundex; Combinations and Learning Similarity Functions; Bibliography; Operator: Similarity Join. - Graph Partitioning ApproachGraph Construction; Graph Partitioning; Merging; Using Constraints for Deduplication; Candidate Sets of Partitions; Maximizing Constraint Satisfaction; Blocking; Bibliography; Data Cleaning Scripts; Record Matching Scripts; Deduplication Scripts; Support for Script Development; User Interface for Developing Scripts; Configurable Data Cleaning Scripts; Bibliography; Conclusion; Bibliography; Authors' Biographies. - Set Similarity Join (SSJoin)Instantiations; Edit Distance; Jaccard Containment and Similarity; Implementing the SSJoin Operator; Basic SSJoin Implementation; Filtered SSJoin Implementation; Bibliography; Operator: Clustering; Definitions; Techniques; Hash Partition; Graph-based Clustering; Bilbiography; Operator: Parsing; Regular Expressions; Hidden Markov Models; Training HMMs; Use of HMMs for Parsing; Bibliography; Task: Record Matching; Schema Matching; Record Matching; Bipartite Graph Construction; Weighted Edges; Graph Matching; Bibliography; Task: Deduplication. - Data warehouses consolidate various activities of a business and often form the backbone for generating reports that support important business decisions. Errors in data tend to creep in for a variety of reasons. Some of these reasons include errors during input data collection and errors while merging data collected independently across different databases. These errors in data warehouses often result in erroneous upstream reports, and could impact business decisions negatively. Therefore, one of the critical challenges while maintaining large data warehouses is that of ensuring the quality o
Emner
Sjanger
Dewey
ISBN
9781608456772

Bibliotek som har denne