autoCV Module¶

Description :

This module is used for model selection:
- Automate the models training with cross validation
- GridSearch the best parameters
- Export the optimized models as pkl files, and saved them in /pkl folders
- Validate the optimized models, and select the best model

Class
- dynaClassifier : Focus on classification problems
  
  fit() : fit method for classifier
- dynaRegressor : Focus on regression problems
  
  fit() : fit method for regressor

Current available estimators

clf_cv : Class focusing on classification estimators

lgr - Logistic Regression (aka logit, MaxEnt) classifier - LogisticRegression()

svm : C-Support Vector Classification - SVM.SVC()

mlp : Multi-layer Perceptron classifier - MLPClassifier()

ada : An AdaBoost classifier - AdaBoostClassifier()

rf : Random Forest classifier - RandomForestClassifier()

gb : Gradient Boost classifier - GradientBoostingClassifier()

xgb : XGBoost classifier - xgb.XGBClassifier()

reg_cv : Class focusing on regression estimators

lr : Linear Regression - LinearRegression()

knn : Regression based on k-nearest neighbors - KNeighborsRegressor()

svr : Epsilon-Support Vector Regression - SVM.SVR()

rf : Random Forest Regression - RandomForestRegressor()

ada : An AdaBoost regressor - AdaBoostRegressor()

gb : Gradient Boosting for regression -GradientBoostingRegressor()

tree : A decision tree regressor - DecisionTreeRegressor()

mlp : Multi-layer Perceptron regressor - MLPRegressor()

xgb : XGBoost regression - XGBRegressor()

hgboost : Hist Gradient Boosting regression - HistGradientBoostingRegressor(); New added on 8/7/2020

huber : Huber regression - HuberRegressor(); New added on 8/7/2020

rgcv : Ridge cross validation regression - RidgeCV(); New added on 8/7/2020

cvlasso : Lasso cross validation regression - LassoCV(); New added on 8/7/2020

sgd : Stochastic Gradient Descent regression - SGDRegressor(); New added on 8/7/2020

dynaClassifier¶

class dynapipe.autoCV.dynaClassifier(custom_estimators=None, random_state=13, cv_num=5, in_pipeline=False, input_from_file=True)[source]¶

This class implements classification model selection with hyperparameters grid search and cross-validation.

Parameters:

custom_estimators (list, default = None) – Custom set the estimators in the autoCV regression module(if set None, will use all available estimators). Current version’s default available estimators are [‘lgr’,’svm’,’mlp’,’rf’,’ada’,’gb’,’xgb’].
random_state (int, default = None) – Random state value.
cv (int, default = None) – # of folds for cross-validation.
in_pipeline (bool, default = False) – Should be set to “True” when using autoPipe module to build Pipeline Cluster Traveral Experiments.
input_from_file (bool, default = True) – When input dataset is df, needs to set “True”; Otherwise, i.e. array, needs to set “False”.

Example

[Example]

https://dynamic-pipeline.readthedocs.io/en/latest/demos.html#model-selection-for-a-classification-problem-using-autocv

References

None

fit(tr_features=None, tr_labels=None)[source]¶

Fit and train datasets with classification hyperparameters GridSearch and CV across multiple estimators. Module will Auto save trained model as {estimator_name}_clf_model.pkl file to ./pkl folder. :param features: Train features columns. ( NOTE: In the Pipeline Cluster Traversal Experiments, the features columns should be from the same pipeline dataset). :type features: df, default = None :param labels: Train label column.

( NOTE: In the Pipeline Cluster Traversal Experiments, the label column should be from the same pipeline dataset).

Returns:	cv_num (int) – # of fold for cross-validation. DICT_EST (dictionary) – key is the name of estimators, value is the ralated trained model NOTE - Trained model auto save function only avalable when in_pipeline = “False”. NOTE - Log records will generate and save to ./logs folder automatedly.

dynaRegressor¶

class dynapipe.autoCV.dynaRegressor(custom_estimators=None, random_state=25, cv_num=5, in_pipeline=False, input_from_file=True)[source]¶

This class implements regression model selection with with hyperparameters grid search and cross-validation. Module will Auto save trained model as {estimator_name}_reg_model.pkl file to ./pkl folder.

Parameters:

custom_estimators (list, default = None) – Custom set the estimators in the autoCV regression module(if set None, will use all available estimators). Current version’s default available estimators are [‘lr’,’knn’,’tree’,’svm’,’mlp’,’rf’,’gb’,’ada’,’xgb’,’hgboost’,’huber’,’rgcv’,’cvlasso’,’sgd’].
random_state (int, default = None) – Random state value.
cv (int, default = None) – # of folds for cross-validation.
in_pipeline (bool, default = False) – Should be set to “True” when using autoPipe module to build Pipeline Cluster Traveral Experiments.
input_from_file (bool, default = True) – When input dataset is df, needs to set “True”; Otherwise, i.e. array, needs to set “False”.

Example

[Example]

https://dynamic-pipeline.readthedocs.io/en/latest/demos.html#model-selection-for-a-regression-problem-using-autocv

References

None

fit(tr_features=None, tr_labels=None)[source]¶

Fit and train datasets with regression hyperparameters GridSearch and CV across multiple estimators.

Parameters:

features (df, default = None) – Train features columns. ( NOTE: In the Pipeline Cluster Traversal Experiments, the features columns should be from the same pipeline dataset).
labels (df ,default = None) – Train label column. ( NOTE: In the Pipeline Cluster Traversal Experiments, the label column should be from the same pipeline dataset).

Returns:

cv_num (int) – # of fold for cross-validation.
DICT_EST (dictionary) – key is the name of estimators, value is the ralated trained model.
NOTE - Trained model auto save function only avalable when in_pipeline = “False”.
NOTE - Log records will generate and save to ./logs folder automatedly.

evaluate_model¶

class dynapipe.autoCV.evaluate_model(model_type=None, in_pipeline=False)[source]¶

This class implements model evaluation and return key score results.

Parameters:	model_type (str, default = None) – Value in [“reg”,”cls”]. The “reg” for regression problem, and “cls” for classification problem. in_pipeline (bool, default = False) – Should be set to “True” when using autoPipe module to build Pipeline Cluster Traveral Experiments.

Example

[Example]

https://dynamic-pipeline.readthedocs.io/en/latest/demos.html#build-pipeline-cluster-traveral-experiments-using-autopipe

References

fit(name=None, model=None, features=None, labels=None)[source]¶

Model evaluation with all models by the validate datasets.

Parameters:	name (str, default = None) – Estimator name. model (pkl, default = None) – Model needs to evaluate. Needs pkl file as input when in_pipeline = “False”; otherwise, should use DICT_EST[estimator name] as the input here. features (df, default = None) – Validate features columns. ( NOTE: In the Pipeline Cluster Traversal Experiments, the features columns should be from the same pipeline dataset). labels (df ,default = None) – Validate label column. ( NOTE: In the Pipeline Cluster Traversal Experiments, the label column should be from the same pipeline dataset).
Returns:	optimal_scores – When model_type = “cls”, will return [name,accuracy,precision,recall,latency] info of model validation results. when model_type = “reg”, will return [name,R-squared,MAE,MSE,RMSE,latency] info of model validation results.
Return type:	list

clf_cv¶

class dynapipe.estimatorCV.clf_cv(cv_val=None, random_state=None)[source]¶

This class stores classification estimators.

Parameters:	random_state (int, default = None) – Random state value. cv_val (int, default = None) – # of folds for cross-validation.

Example

[Example]

References

None

reg_cv¶

class dynapipe.estimatorCV.reg_cv(cv_val=None, random_state=None)[source]¶

This class stores regression estimators.

Parameters:	random_state (int, default = None) – Random state value. cv_val (int, default = None) – # of folds for cross-validation.

Example

[Example]

References

None

data_splitting_tool¶

dynapipe.utilis_func.data_splitting_tool(feature_cols=None, label_col=None, val_size=0.2, test_size=0.2, random_state=13)[source]¶

Splitting each pipeline’s dataset into train, validate, and test parts for Pipeline Cluster Traversal Experiments.

NOTE: When in_pipeline = “True”, this function will be built-in function in autoPipe module. So it needs to use pipeline_splitting_rule() to setup splitting rule.

Parameters:

label_col (array/df, default = None) – Column of label.
feature_cols (df, default = None) – Feature columns.
val_size (float, default = 0.2) – Value within [0~1]. Percentage of validate data. NOTE - When val_size with no input value will not return X_val & y_val
test_size (float, default = 0.2) – Value within [0~1]. Percentage of test data.
random_state (int, default = 13) – Random state value.

Returns:

X_train (array) – Train features dataset
y_train (array) – Train label dataset
X_val (array) – Validate features datset
y_val (array) – Validate label dataset
X_test (array) – Test features dataset
y_test (array) – Test label dataset

reset_parameters¶

dynapipe.utilis_func.reset_parameters()[source]¶

Reset autoCV estimators hyperparameters and searching range to default values.

Parameters:	None –
Returns:
Return type:	None

Example

update_parameters¶

dynapipe.utilis_func.update_parameters(mode='None', estimator_name='None', **kwargs)[source]¶

Update autoCV estimators hyperparameters and searching range to custom values.

NOTE: One line of command could only update one estimator.

Parameters:	mode (str, default = None) – Value in [“cls”,”reg”]. “cls” will modify classification estimators; “reg” will modify regression estimators. estimator_name (str, default = None) – Name of estimator. *kwargs (list, default = None*) – Lists of values using comma splitting, i.e. C=[0.1,0.2],kernel=[“linear”].
Returns:
Return type:	None

Example

export_parameters¶

dynapipe.utilis_func.export_parameters()[source]¶

Export current autoCV estimators hyperparameters and searching range to current work dictionary.

Parameters:	None –
Returns:
Return type:	None

Example

Defaults Parameters for Classifiers/Regressors¶

Estimators default parameters setting:

Classifiers Estimators Default Parameters Searching Range¶
Estimators:	Parameters:	Value Range:
lgr	‘C’	[0.001, 0.01, 0.1, 1, 10, 100, 1000]
svm	‘C’	[0.1, 1, 10]
	‘kernel’	[‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’]
mlp	‘activation’	[‘identity’,’relu’, ‘tanh’, ‘logistic’]
	‘hidden_layer_sizes’	[10, 50, 100]
	‘learning_rate’	[‘constant’, ‘invscaling’, ‘adaptive’]
	‘solver’	[‘lbfgs’, ‘sgd’, ‘adam’]
ada	‘n_estimators’	[50,100,150]
	‘learning_rate’	[0.01,0.1, 1, 5, 10]
rf	‘max_depth’	[2, 4, 8, 16, 32]
	‘n_estimators’	[5, 50, 250]
gb	‘n_estimators’	[50,100,150,200,250,300]
	‘max_depth’	[1, 3, 5, 7, 9]
	‘learning_rate’	[0.01, 0.1, 1, 10, 100]
xgb	‘n_estimators’	[50,100,150,200,250,300]
	‘max_depth’	[3, 5, 7, 9]
	‘learning_rate’	[0.01, 0.1, 0.2,0.3,0.4]

Regressors Default Parameters Searching Range¶
Estimators:	Parameters:	Value Range:
lr	‘normalize’	[True,False]
svm	‘C’	[0.1, 1, 10]
	‘kernel’	[‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’]
mlp	‘activation’	[‘identity’,’relu’, ‘tanh’, ‘logistic’]
	‘hidden_layer_sizes’	[10, 50, 100]
	‘learning_rate’	[‘constant’, ‘invscaling’, ‘adaptive’]
	‘solver’	[‘lbfgs’, ‘adam’]
ada	‘n_estimators’	[50,100,150,200,250,300]
	‘loss’	[‘linear’,’square’,’exponential’]
	‘learning_rate’	[0.01, 0.1, 0.2,0.3,0.4]
tree	‘splitter’	[‘best’, ‘random’]
	‘max_depth’	[1, 3, 5, 7, 9]
	‘min_samples_leaf’	[1,3,5]
rf	‘max_depth’	[2, 4, 8, 16, 32]
	‘n_estimators’	[5, 50, 250]
gb	‘n_estimators’	[50,100,150,200,250,300]
	‘max_depth’	[3, 5, 7, 9]
	‘learning_rate’	[0.01, 0.1, 0.2,0.3,0.4]
xgb	‘n_estimators’	[50,100,150,200,250,300]
	‘max_depth’	[3, 5, 7, 9]
	‘learning_rate’	[0.01, 0.1, 0.2,0.3,0.4]
sgd	‘shuffle’	[True,False]
	‘penalty’	[‘l2’, ‘l1’, ‘elasticnet’]
	‘learning_rate’	[‘constant’,’optimal’,’invscaling’]
cvlasso	‘fit_intercept’	[True,False]
rgcv	‘fit_intercept’	[True,False]
huber	‘fit_intercept’	[True,False]
hgboost	‘max_depth’	[3, 5, 7, 9]
	‘learning_rate’	[0.1, 0.2,0.3,0.4]

autoCV Module¶

dynaClassifier¶

dynaRegressor¶

evaluate_model¶

clf_cv¶

reg_cv¶

data_splitting_tool¶

reset_parameters¶

update_parameters¶

export_parameters¶

Defaults Parameters for Classifiers/Regressors¶

Table of Contents

Previous topic

Next topic

This Page