The art of feature engineering : essentials for machine learning /


Pablo Duboue.
Bok Engelsk 2020 · Electronic books.

Originaltittel
Omfang
1 online resource (xii, 274 pages) : : digital, PDF file(s).
Opplysninger
Title from publisher's bibliographic system (viewed on 01 Jun 2020).. - Cover -- Half-title -- Title page -- Copyright information -- Dedication -- Contents -- Preface -- Part One Fundamentals -- 1 Introduction -- 1.1 Feature Engineering -- 1.2 Evaluation -- 1.2.1 Metrics -- 1.2.2 Cross-Validation -- 1.2.2.1 Out-of-Fold Estimation -- 1.2.3 Overfitting -- 1.2.4 Curse of Dimensionality -- 1.3 Cycles -- 1.3.1 ML Cycle -- 1.3.2 Feature Engineering Cycle -- 1.4 Analysis -- 1.4.1 Exploratory Data Analysis -- 1.4.2 Error Analysis -- 1.5 Other Processes -- 1.5.1 Domain Modelling -- 1.5.2 Feature Construction -- 1.5.2.1 Feature Types -- 1.5.2.2 Feature Templates -- 1.6 Discussion -- 1.7 Learning More -- 2 Features, Combined: Normalization, Discretizationand Outliers -- 2.1 Normalizing Features -- 2.1.1 Standardization and Decorrelation -- 2.1.2 Smoothing -- 2.1.2.1 Probability Smoothing -- 2.1.3 Feature Weighting -- 2.1.3.1 Inverse Document Frequency Weighting -- 2.1.3.2 Camera Calibration -- 2.2 Discretization and Binning -- 2.2.1 Unsupervised Discretization -- 2.2.1.1 Binning -- 2.2.1.2 Clustering -- 2.2.2 Supervised Discretization -- 2.2.2.1 ChiMerge -- 2.2.2.2 MDLP -- 2.2.2.3 CAIM -- 2.3 Descriptive Features -- 2.3.1 Histograms -- 2.3.2 Other Descriptive Features -- 2.4 Dealing with Outliers -- 2.4.1 Outlier Detection -- 2.5 Advanced Techniques -- 2.6 Learning More -- 3 Features, Expanded: Computable Features, Imputationand Kernels -- 3.1 Computable Features -- 3.2 Imputation -- 3.3 Decomposing Complex Features -- 3.4 Kernel-Induced Feature Expansion -- 3.5 Learning More -- 4 Features, Reduced: Feature Selection, Dimensionality Reduction and Embeddings -- 4.1 Feature Selection -- 4.1.1 Metrics -- 4.1.1.1 Feature Utility Metrics -- 4.1.1.2 Multiple-Feature Metrics -- 4.1.1.3 Single-Feature Classifiers -- 4.1.1.4 Wrapper -- 4.1.2 Assembling the Feature Set: Search and Filter -- 4.1.2.1 Greedy -- 4.1.2.2 Stopping Criteria.. - 4.1.3 Advanced Techniques -- 4.1.3.1 Stability -- 4.1.3.2 Blacklisting Features -- 4.2 Regularization and Embedded Feature Selection -- 4.2.1 L[sub(2)] Regularization: Ridge Regression -- 4.2.2 L[sub(1)] Regularization: LASSO -- 4.2.2.1 ElasticNet -- 4.2.3 Other Algorithms Featuring Embedded Feature Selection -- 4.3 Dimensionality Reduction -- 4.3.1 Hashing Features -- 4.3.2 Random Projection -- 4.3.3 SVD -- 4.3.4 Latent Dirichlet Allocation -- 4.3.5 Clustering -- 4.3.6 Other Dimensionality Reduction Techniques -- 4.3.7 Embeddings -- 4.3.7.1 Global Embeddings -- 4.3.7.2 Local Embeddings: Word2Vec -- 4.3.7.3 Other Embeddings: GloVe and ULMFiT -- 4.4 Learning More -- 5 Advanced Topics: Variable-Length Data and Automated Feature Engineering -- 5.1 Variable-Length Feature Vectors -- 5.1.1 Sets -- 5.1.2 Lists -- 5.1.3 Trees -- 5.1.4 Graphs -- 5.1.5 Time Series -- 5.1.5.1 Autocorrelation -- 5.1.5.2 Trend, Cycle and Seasonal Components -- 5.1.5.3 Stationarity -- 5.1.5.4 Time Series Models: ARIMA -- 5.2 Instance-Based Engineering -- 5.3 Deep Learning and Feature Engineering -- 5.3.1 RNNs -- 5.4 Automated Feature Engineering -- 5.4.1 Feature Learning -- 5.4.1.1 Convolutional Neural Networks -- 5.4.1.2 Featuretools -- 5.4.1.3 Genetic Programming -- 5.4.2 Unsupervised Feature Engineering -- 5.5 Learning More -- Part Two Case Studies -- 6 Graph Data -- 6.1 WikiCities Dataset -- 6.2 Exploratory Data Analysis (EDA) -- 6.3 First Feature Set -- 6.3.1 Error Analysis -- 6.3.1.1 Feature Ablation -- 6.3.1.2 Discretizating the Target -- 6.3.1.3 Feature Utility -- 6.3.1.4 Decision Trees for Feature Analysis -- 6.4 Second Feature Set -- 6.4.1 Feature Stability to Create a Conservative Feature Set -- 6.5 Final Feature Sets -- 6.5.1 Possible Followups -- 6.6 Learning More -- 7 Timestamped Data -- 7.1 WikiCities: Historical Features -- 7.1.1 Exploratory Data Analysis.. - 7.2 Time Lagged Features -- 7.2.1 Imputing Timestamped Data -- 7.2.2 First Featurization: Imputed Lag-2 -- 7.2.2.1 Markov Assumption -- 7.2.2.2 Differential Features -- 7.2.3 Error Analysis -- 7.3 Sliding Windows -- 7.3.1 Second Featurization: Single Moving Average -- 7.4 Third Featurization: EMA -- 7.5 Historical Data as Data Expansion -- 7.5.1 Fourth Featurization: Expanded Data -- 7.5.2 Discussion -- 7.6 Time Series -- 7.6.1 WikiCountries Dataset -- 7.6.2 Exploratory Data Analysis -- 7.6.3 First Featurization: NoTS Features -- 7.6.4 Second Featurization: TS as Features -- 7.6.5 Using the Model Prediction as a Feature -- 7.6.6 Discussion -- 7.7 Learning More -- 8 Textual Data -- 8.1 WikiCities: Text -- 8.2 Exploratory Data Analysis -- 8.3 Numeric Tokens Only -- 8.3.1 Word Types versus Word Tokens -- 8.3.2 Tokenization: Basics -- 8.3.3 First Featurization -- 8.4 Bag-of-Words -- 8.4.1 Tokenization -- 8.4.2 Second Featurization -- 8.4.2.1 Error Analysis -- 8.5 Stop Words and Morphological Features -- 8.5.1 Stop Words -- 8.5.2 Tokenization: Stemming -- 8.5.3 Third Featurization -- 8.5.3.1 Error Analysis -- 8.6 Features in Context -- 8.6.1 Bigrams -- 8.6.2 Fourth Featurization -- 8.7 Skip Bigrams and Feature Hashing -- 8.7.1 Skip Bigram -- 8.7.2 Fifth Featurization -- 8.8 Dimensionality Reduction and Embeddings -- 8.8.1 Embeddings -- 8.8.2 Feature Weighting: TF-IDF -- 8.8.3 Sixth Featurization -- 8.9 Closing Remarks -- 8.9.1 Content Expansion -- 8.9.2 Structure in Text -- 8.10 Learning More -- 9 Image Data -- 9.1 WikiCities: Satellite Images -- 9.2 Exploratory Data Analysis -- 9.3 Pixels as Features -- 9.3.1 First Featurization -- 9.3.1.1 Error Analysis -- 9.3.2 Computable Features: Gaussian Blur -- 9.3.3 Whitening -- 9.3.4 Error Analysis on Variations -- 9.4 Automatic Dataset Expansion -- 9.4.1 Affine Transformations -- 9.4.2 Second Featurization.. - 9.4.2.1 Error Analysis -- 9.5 Descriptive Features: Histograms -- 9.5.1 Third Featurization -- 9.5.1.1 Error Analysis -- 9.6 Local Feature Detectors: Corners -- 9.6.1 Harris Corner Detector -- 9.6.2 Fourth Featurization -- 9.6.2.1 Error Analysis -- 9.7 Dimensionality Reduction: HOGs -- 9.7.1 Fifth Featurization -- 9.8 Closing Remarks -- 9.8.1 Other Topics in Computer Vision -- 9.9 Learning More -- 10 Other Domains: Video, GIS and Preferences -- 10.1 Video -- 10.1.1 Data: Screencast -- 10.1.2 Key Frame Detection -- 10.1.3 Blobs Tracking: Mean-Shift -- 10.1.3.1 Histogram Back Projection -- 10.1.3.2 Mean-Shift -- 10.1.4 Learning More -- 10.2 Geographical Features -- 10.2.1 Data: Animal Migration -- 10.2.1.1 Radial Distance to Landmarks -- 10.2.1.2 Learning More -- 10.3 Preferences -- 10.3.1 Data: Linux Kernel Commits -- 10.3.2 Imputing Preferences -- 10.3.3 Learning More -- Bibliography -- Index.. - When machine learning engineers work with data sets, they may find the results aren't as good as they need. Instead of improving the model or collecting more data, they can use the feature engineering process to help improve results by modifying the data's features to better capture the nature of the problem. This practical guide to feature engineering is an essential addition to any data scientist's or machine learning engineer's toolbox, providing new ideas on how to improve the performance of a machine learning solution. Beginning with the basic concepts and techniques, the text builds up to a unique cross-domain approach that spans data on graphs, texts, time series, and images, with fully worked out case studies. Key topics include binning, out-of-fold estimation, feature selection, dimensionality reduction, and encoding variable-length data. The full source code for the case studies is available on a companion website as Python Jupyter notebooks.
Emner
Sjanger
Dewey
ISBN
9781108709385 (pbk.) : : £37.99. - £30.00. - £140.00
ISBN(galt)
9781108573160 (ebook) :. - 9781108671682 (PDF ebook) :

Bibliotek som har denne