autoFS Module

Description :
  • This module is used for features selection:
    • Automate the feature selection with several selectors
    • Evaluate the outputs from all selector methods, and ranked a final list of the top important features
  • Class:
    • dynaFS_clf : Focus on classification problems
      • fit() - fit and transform method for classifier
    • dynaFS_reg : Focus on regression problems
      • fit() - fit and transform method for regressor
  • Current available selectors
    • clf_fs : Class focusing on classification features selection
      • kbest_f : SelectKBest() with f_classif core
      • kbest_chi2 - SelectKBest() with chi2 core
      • rfe_lr - RFE with LogisticRegression() estimator
      • rfe_svm - RFE with SVC() estimator
      • rfecv_svm - RFECV with SVC() estimator
      • rfe_tree - RFE with DecisionTreeClassifier() estimator
      • rfecv_tree - RFECV with DecisionTreeClassifier() estimator
      • rfe_rf - RFE with RandomForestClassifier() estimator
      • rfecv_rf - RFECV with RandomForestClassifier() estimator
    • reg_fs : Class focusing on regression features selection
      • kbest_f : SelectKBest() with f_regression core
      • rfe_svm : RFE with SVC() estimator
      • rfecv_svm : RFECV with SVC() estimator
      • rfe_tree : RFE with DecisionTreeRegressor() estimator
      • rfecv_tree : RFECV with DecisionTreeRegressor() estimator
      • rfe_rf : RFE with RandomForestRegressor() estimator
      • rfecv_rf : RFECV with RandomForestRegressor() estimator

dynaFS_clf

class dynapipe.autoFS.dynaFS_clf(fs_num=None, random_state=None, cv=None, in_pipeline=False, input_from_file=True)[source]

This class implements feature selection for classification problem.

Parameters:
  • fs_num (int, default = None) – Set the # of features want to select out.
  • random_state (int, default = None) – Random state value.
  • cv (int, default = None) – # of folds for cross-validation.
  • in_pipeline (bool, default = False) – Should be set to “True” when using autoPipe module to build Pipeline Cluster Traveral Experiments.
  • input_from_file (bool, default = True) – When input dataset is dataframe, needs to set “True”; Otherwise, i.e. array, needs to set “False”.

Example

References

None

fit(tr_features, tr_labels)[source]

Fits and transforms a dataframe with built-in algorithms, to select top features.

Parameters:
  • tr_features (df, default = None) – Train features columns. (NOTE: In the Pipeline Cluster Traversal Experiments, the features columns should be from the same pipeline dataset).
  • tr_labels (array/df, default = None) – Train label column, when input_from_file = True, must be pandas datframe. (NOTE: In the Pipeline Cluster Traversal Experiments, the label column should be from the same pipeline dataset).
Returns:

  • fs_num (int) – # of top features has been select out.
  • fs_results (array) – Selected & ranked top feature names.
  • NOTE - Log records will generate and save to ./logs folder automatedly.

dynaFS_reg

class dynapipe.autoFS.dynaFS_reg(fs_num=None, random_state=None, cv=None, in_pipeline=False, input_from_file=True)[source]

This class implements feature selection for regression problem.

Parameters:
  • fs_num (int, default = None) – Set the # of features want to select out.
  • random_state (int, default = None) – Random state value.
  • cv (int, default = None) – # of folds for cross-validation.
  • in_pipeline (bool, default = False) – Should be set to “True” when using autoPipe module to build Pipeline Cluster Traveral Experiments.
  • input_from_file (bool, default = True) – When input dataset is dataframe, needs to set “True”; Otherwise, i.e. array, needs to set “False”.

Example

[Example]https://dynamic-pipeline.readthedocs.io/en/latest/demos.html#features-selection-for-a-regression-problem-using-autoFS

References

None

fit(tr_features, tr_labels)[source]

Fits and transforms a dataframe with built-in algorithms, to select top features.

Parameters:
  • tr_features (df, default = None) – Train features columns. (NOTE: In the Pipeline Cluster Traversal Experiments, the features columns should be from the same pipeline dataset).
  • tr_labels (array/df, default = None) – Train label column, when input_from_file = True, must be pandas datframe. (NOTE: In the Pipeline Cluster Traversal Experiments, the label column should be from the same pipeline dataset).
Returns:

  • fs_num (int) – # of top features has been select out.
  • fs_results (array) – Selected & ranked top feature names.
  • NOTE - Log records will generate and save to ./logs folder automatedly.

clf_fs

class dynapipe.selectorFS.clf_fs(fs_num=None, random_state=None, cv=None)[source]

This class stores classification selectors.

Parameters:
  • fs_num (int, default = None) – Set the # of features want to select out.
  • random_state (int, default = None) – Random state value.
  • cv (int, default = None) – # of folds for cross-validation.

Example

[Example]

References

None

reg_fs

class dynapipe.selectorFS.reg_fs(fs_num, random_state=None, cv=None)[source]

This class stores regression selectors.

Parameters:
  • fs_num (int, default = None) – Set the # of features want to select out.
  • random_state (int, default = None) – Random state value.
  • cv (int, default = None) – # of folds for cross-validation.

Example

[Example]

References

None