Validation Report

Validation reports are generated to evaluate the model's quality. So rather than writing functions on your own, we can use this report to quickly understand how the models perform.

A validation report is a markdown report with various phases that explain the model's various abilities from which we can infer it's real-world performance.

How to access the feature?

from sanatio.validations.routines import ValidationRoutine

val_object = ValidationRoutine(...parameters...)

Parameters required for initialization

Not all the parameters are required for creating an object. Initialize only the parameters that will be required for the routine you need.

Parameter name
Data type
About

predicted

Data Series

The prediction made by the model

actual

Data Series

The actual ground truth value used to evaluate the model.

weight

Array

The weight of the model

data

Data Frame

The feature data of the model whose columns were used for training

predicted_probability

Data Series

The probability of label 1 prediction

two_class_probability

Data Frame

The probability of both the classes

pearson_threshold

Int - Optional

A number to determine the pearson threshold you need

vif_factor

Int - Optional

A number to determine the vif threshold you need

cat_columns

List

The name of categorical columns in the dataframe

markdown_report

Boolean

True to generate a markdown report

print_report

Boolean

True to print the report

cluster_centroids

Data Series

The cluster labels predicted by the model

cluster_k_values

List

The different k values you may need if you want to train a model

cluster_train

Boolean

True to train a model

All parameters are set to None by default, and if any parameters are missing for a specific routine, you will receive an error message indicating that a parameter is missing.

Routines

Routines are used to generate different validation reporting structure which depends upon the type of mode you use.

The routines that are available in Sanatio with their respective function calls are

  • Binary logistic regression routine

    • binary_logistic_regression_routine()

  • Linear regression routine

    • linear_regression_routine()

  • Tree based classification routine

    • tree_based_classification_routine()

How to validate the model?

  1. Import the ValidationRoutine class from sanatio.validations.

  2. Create the validation object and initialize with required parameters as per your model.

  3. Call the specific routine function you need

from sanatio.validations.routines import ValidationRoutine

val_object = ValidationRoutine(...parameters...)
val_object.binary_logistic_regression_routine()

Example

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
import pandas as pd
from sklearn.linear_model import LogisticRegression

data = load_breast_cancer()
X,y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

lr =LogisticRegression()
lr.fit(X_train,y_train)
weights = lr.coef_
prediction = lr.predict(X_test)
one_class = lr.predict_proba(X_test)[:,1]
two_class = lr.predict_proba(X_test)
from sanatio.validations.routines import ValidationRoutine
obj = ValidationRoutine(predicted=prediction,actual=y_test,
                        weight=weights.transpose(), data=pd.DataFrame(X_test,columns=data.feature_names),
                        predicted_probability=one_class,
                        two_class_probability = two_class,
                        pearson_threshold=0.4,vif_factor=5,cat_columns=None)

obj.binary_logistic_regression_routine()

Last updated