ML Performance Improvement Cheat sheet
(a). Improve Performance With Data
(b). Improve Performance With Algorithms
(c). Improve Performance With Tuning
(d). Improve Performance With Ensembles
Process:
1. Pick one group ; (a) Data. (b) Algorithms. (c) Tuning. (d) Ensembles.
2. Pick one method from the group.
3. Pick one thing to try of the chosen method.
4. Compare the results, keep if there was an improvement.
5. Repeat.
Improve Performance With Data
1. Get More Data
2. Invent More Data
3. Clean Your Data :
- ex) missing / corrupt / outlier data need to be removed or updated
4. Resample Data :
- over / under sampling
5. Reframe Your Problem :
- Reframe your data as a regression, binary or multiclass classification, time series, anomaly detection, rating, recommender, etc. type problem
6. Rescale Your Data :
- rescale numeric input variables (Normalization and standardization of data)
7. Transform Your Data
- By using (log / exponencial ... functions), we can make data more Gaussian -> may better expose features
8. Project Your Data
- project data to lower dimension (i.e. PCA)
9. Feature Selection
- By checking feature importance / selection methods, figure out what feature you gonna use it
10. Feature Engineering
- create and add new data features (attributes that can be aggregated to signify an event (like a count, binary flag or statistical summary))
Improve Performance With Algorithms
1. Resampling Method
- i.e) k-fold cross-validation method with a hold out validation dataset
2. Evaluation Metric:
- Accuracy is a valid choice of evaluation for classification problems which are well balanced and not skewed or No class imbalance
- Precision is a valid choice of evaluation metric when we want to be very sure of our prediction
- Recall is a valid choice of evaluation metric when we want to capture as many positives as possible
- F1-score is a number between 0 and 1 and is the harmonic mean of precision and recall. (F1 score sort of maintains a balance between the precision and recall for your classifier)
- auc is the area under the ROC curve. AUC ROC indicates how well the probabilities from the positive classes are separated from the negative classes.
* AUC is scale-invariant. It measures how well predictions are ranked, rather than their absolute values. So, for example, if you as a marketer want to find a list of users who will respond to a marketing campaign. AUC is a good metric to use since the predictions ranked by probability is the order in which you will create a list of users to send the marketing campaign.
Another benefit of using AUC is that it is classification-threshold-invariant like log loss. It measures the quality of the model’s predictions irrespective of what classification threshold is chosen, unlike F1 score or accuracy which depend on the choice of threshold.
- Log Loss / Binary Crossentropy takes into account the uncertainty of your prediction based on how much it varies from the actual label. It is susceptible in case of imbalanced datasets
3. Baseline Performance
- Use a random / zero rule algorithm (predict mean or mode) to establish a baseline by which to rank all evaluated algorithms.
4. Spot-Check Linear Algorithms
- Linear methods are often more biased, are easy to understand and are fast to train
- Evaluate a diverse suite of linear methods
5. Spot-Check Nonlinear Algorithms
- Nonlinear algorithms often require more data, have greater complexity but can achieve better performance
- Evaluate a diverse suite of nonlinear methods
3. Improve Performance With Tuning
* You can often unearth one or two well-performing algorithms quickly from spot-checking
* Get the most out of well-performing machine learning algorithms
1. Diagnostics
- review learning curves to understand whether the method is over or underfitting the problem, and then correct
- Different algorithms may offer different visualizations and diagnostics
- Review what the algorithm is predicting right and wrong
2. Try Intuition
3. Steal from Literature
4. Random Search
- use random search of algorithm hyperparameters to expose configurations that you would never think to try
5. Grid Search
- there are grids of standard hyperparameter values that you can enumerate to find good configurations, then repeat the process with finer and finer grids
6. Optimize:
- there are parameters like structure or learning rate than can be tuned using a direct search procedure (like pattern search) or stochastic optimization (like a genetic algorithm).
7. Alternate Implementations
- alternate implementation of the method can achieve better results on the same data
- Each algorithm has a myriad of micro-decisions that must be made by the algorithm implementor
- Some of these decisions may affect skill on your problem
8. Algorithm Extensions
- lift performance by evaluating common or standard extensions to the method
- This may require implementation work
9. Algorithm Customizations
- there are modifications that you can make to the algorithm for your data, from loss function, internal optimization methods to algorithm specific decisions
4. Improve Performance With Ensembles
* combine the predictions from multiple models. After algorithm tuning, this is the next big area for improvement.
* often get good performance from combining the predictions from multiple good enough models rather than from multiple highly tuned (and fragile) models.
1. Blend Model Predictions
- use the same or different algorithms to make multiple models. Take the mean or mode from the predictions of multiple well-performing models.
2. Blend Data Representations
- combine predictions from models trained on different data representations
3. Blend Data Samples :
- combine models trained on different views of data
- create multiple subsamples of your training data and train a well-performing algorithm, then combine predictions
- This is called bootstrap aggregation or bagging and works best when the predictions from each model are skillful but in different ways (uncorrelated)
4. Correct Predictions :
- correct the predictions of well-performing models
- explicitly correct predictions or use a method like boosting to learn how to correct prediction errors
5. Learn to Combine :
- use a new model to learn how to best combine the predictions from multiple well-performing models
- This is called stacked generalization or stacking and often works well when the submodels are skillful but in different ways and the aggregator model is a simple linear weighting of the predictions
- This process can be repeated multiple layers deep
댓글