autoFS Module¶
- Description :
- This module is used for features selection:
- Automate the feature selection with several selectors
- Evaluate the outputs from all selector methods, and ranked a final list of the top important features
- Class:
- dynaFS_clf : Focus on classification problems
- fit() - fit and transform method for classifier
- dynaFS_reg : Focus on regression problems
- fit() - fit and transform method for regressor
- Current available selectors
- clf_fs : Class focusing on classification features selection
- kbest_f : SelectKBest() with f_classif core
- kbest_chi2 - SelectKBest() with chi2 core
- rfe_lr - RFE with LogisticRegression() estimator
- rfe_svm - RFE with SVC() estimator
- rfecv_svm - RFECV with SVC() estimator
- rfe_tree - RFE with DecisionTreeClassifier() estimator
- rfecv_tree - RFECV with DecisionTreeClassifier() estimator
- rfe_rf - RFE with RandomForestClassifier() estimator
- rfecv_rf - RFECV with RandomForestClassifier() estimator
- reg_fs : Class focusing on regression features selection
- kbest_f : SelectKBest() with f_regression core
- rfe_svm : RFE with SVC() estimator
- rfecv_svm : RFECV with SVC() estimator
- rfe_tree : RFE with DecisionTreeRegressor() estimator
- rfecv_tree : RFECV with DecisionTreeRegressor() estimator
- rfe_rf : RFE with RandomForestRegressor() estimator
- rfecv_rf : RFECV with RandomForestRegressor() estimator
dynaFS_clf¶
-
class
dynapipe.autoFS.
dynaFS_clf
(fs_num=None, random_state=None, cv=None, in_pipeline=False, input_from_file=True)[source]¶ This class implements feature selection for classification problem.
Parameters: - fs_num (int, default = None) – Set the # of features want to select out.
- random_state (int, default = None) – Random state value.
- cv (int, default = None) – # of folds for cross-validation.
- in_pipeline (bool, default = False) – Should be set to “True” when using autoPipe module to build Pipeline Cluster Traveral Experiments.
- input_from_file (bool, default = True) – When input dataset is dataframe, needs to set “True”; Otherwise, i.e. array, needs to set “False”.
Example
References
None
-
fit
(tr_features, tr_labels)[source]¶ Fits and transforms a dataframe with built-in algorithms, to select top features.
Parameters: - tr_features (df, default = None) – Train features columns. (NOTE: In the Pipeline Cluster Traversal Experiments, the features columns should be from the same pipeline dataset).
- tr_labels (array/df, default = None) – Train label column, when input_from_file = True, must be pandas datframe. (NOTE: In the Pipeline Cluster Traversal Experiments, the label column should be from the same pipeline dataset).
Returns: - fs_num (int) – # of top features has been select out.
- fs_results (array) – Selected & ranked top feature names.
- NOTE - Log records will generate and save to ./logs folder automatedly.
dynaFS_reg¶
-
class
dynapipe.autoFS.
dynaFS_reg
(fs_num=None, random_state=None, cv=None, in_pipeline=False, input_from_file=True)[source]¶ This class implements feature selection for regression problem.
Parameters: - fs_num (int, default = None) – Set the # of features want to select out.
- random_state (int, default = None) – Random state value.
- cv (int, default = None) – # of folds for cross-validation.
- in_pipeline (bool, default = False) – Should be set to “True” when using autoPipe module to build Pipeline Cluster Traveral Experiments.
- input_from_file (bool, default = True) – When input dataset is dataframe, needs to set “True”; Otherwise, i.e. array, needs to set “False”.
Example
[Example] https://dynamic-pipeline.readthedocs.io/en/latest/demos.html#features-selection-for-a-regression-problem-using-autoFS References
None
-
fit
(tr_features, tr_labels)[source]¶ Fits and transforms a dataframe with built-in algorithms, to select top features.
Parameters: - tr_features (df, default = None) – Train features columns. (NOTE: In the Pipeline Cluster Traversal Experiments, the features columns should be from the same pipeline dataset).
- tr_labels (array/df, default = None) – Train label column, when input_from_file = True, must be pandas datframe. (NOTE: In the Pipeline Cluster Traversal Experiments, the label column should be from the same pipeline dataset).
Returns: - fs_num (int) – # of top features has been select out.
- fs_results (array) – Selected & ranked top feature names.
- NOTE - Log records will generate and save to ./logs folder automatedly.
clf_fs¶
-
class
dynapipe.selectorFS.
clf_fs
(fs_num=None, random_state=None, cv=None)[source]¶ This class stores classification selectors.
Parameters: - fs_num (int, default = None) – Set the # of features want to select out.
- random_state (int, default = None) – Random state value.
- cv (int, default = None) – # of folds for cross-validation.
Example
[Example] References
None
reg_fs¶
-
class
dynapipe.selectorFS.
reg_fs
(fs_num, random_state=None, cv=None)[source]¶ This class stores regression selectors.
Parameters: - fs_num (int, default = None) – Set the # of features want to select out.
- random_state (int, default = None) – Random state value.
- cv (int, default = None) – # of folds for cross-validation.
Example
[Example] References
None