Data Science/Feature Selection

lec0_Feature selection

by Minwoo 2019. 9. 14.

목차

    Methods...

    -----------------------------------------------------------------------------------------------------------------------------------

     

    Filter: Variance / Correlation / Univariate selection

     

    * Filter methods:

    1. rely on the characteristics of the data (feature characteristics)

    2. do not use M.L algorithms

    3. model agnostic

    4. tend to be less computationally expensive

    5. usually give lower prediction performance than a wrapper methods

    6. are very well suited for a quick screen and removal of irrelevant features

     

    : baesd on data! No ML! computationally  cheaper ! lower prediction performance! good for removal of irrelevant features!

     

    two step procedure: 

    1.1 rank features according to a certain criteria.

    1.2 each feature is ranked independently of the feature space

    2. select the highest ranking features

    ->may select redundant variables because they don't consider the relationships b/w features

     

    ranking criteria:

    feature scores on various statistical tests:

    1. Chi-Square | Fisher Score

    2. Univariate parametric tests (anova)

    3. Mutual Information

    4. Variance: Constant features / Quasi-constant features

     

    Multivariate:

    1. handle redundant feature

    2. duplicated features

    3. corrlated features

    4. simple yet powerful methods to quickly remove irrelevant and redundant features

    5. first step in feature selection procedures

     

    Filter methods:

    1. quick dataset screening for irrelevant features

    2. quck removal of redundant features

       * Constant / Duplicated / Correlated    (features)

     

    -----------------------------------------------------------------------------------------------------------------------------------

     

    Wrapper: Forward selection / Backward selection / Exhaustive search

     

    * Wrapper methods:

    1. use predictive M.L models to score the feature subset

    2. train a new model on each feature subset

    3. tend to be very computationally expensive

    4. usually provide the best performing feature subset for a given M.L algorithm

    5. they may not produce the best feature combination for a different M.L model

     

    : use M.L model!  use each feature subset!  computationally expensive!   best performing for a (given M.L algorithm)!! 

     

    Detect interactions b/w variables / Find the optimal feature subset for the desired classifier

     

    Procedure:

    1. Search for a subset of features

    2. Build a M.L model on the selected feature subset

    3. Evaluate model performance

    4. Repeat

     

    -----------------------------------------------------------------------------------------------------------------------------------

     

    Embedded: LASSO / Tree importance / Regression coefficients

     

    * Embedded methods

    1. perform feature selection as part of the model construction process

    2. consider the interaction b/w features and models

    3. they are less computationally expensive than wrapper methods, because they fit the M.L model only once

     

    * Faster than wrapper methods / More accurate than firlter methods / Detect interactions b/w variables / Find the feature subset for the algorithm being trainded

     

    Procedure:

    1. Train a M.L algorithm

    2. Derive the feature importance

    3. Remove non-important features

    'Data Science > Feature Selection' 카테고리의 다른 글

    lec3_Embedded_methods  (0) 2019.09.25
    lec2_Wrapper_methods  (0) 2019.09.25
    lec1_Filter_methods  (0) 2019.09.25