Package 'scorecardModelUtils' reference manual

Title:	Credit Scorecard Modelling Utils
Description:	Provides infrastructure functionalities such as missing value treatment, information value calculation, GINI calculation etc. which are used for developing a traditional credit scorecard as well as a machine learning based model. The functionalities defined are standard steps for any credit underwriting scorecard development, extensively used in financial domain.
Authors:	Arya Poddar [aut, cre], Aiana Goyal [ctb], Kanishk Dogar [ctb]
Maintainer:	Arya Poddar <[email protected]>
License:	GPL-2 \| GPL-3
Version:	0.0.1.0
Built:	2025-02-18 04:41:52 UTC
Source:	https://github.com/cran/scorecardModelUtils

Clubbing class of categorical variables with low population percentage with another class of similar event rate

Description

The function groups classes of categorical variables, which have population percentage less than a threshold, with another class of similar event rate. If a class of exactly same event rate is not available, it is clubbed with the one having a higher event rate closest to it.

Usage

cat_new_class(base, target, cat_var_name, threshold, event = 1)
cat_new_class(base, target, cat_var_name, threshold, event = 1)

Arguments

`base`	input dataframe
`target`	column / field name for the target variable to be passed as string (must be 0/1 type)
`cat_var_name`	column name or array of column names of categorical variable on which the operation is to be done, to be passed as string
`threshold`	threshold population percentage below which the class will be considered to be be clubbed with another class, to be provided as decimal/fraction
`event`	(optional) the event class, to be passed as 0 or 1 (default is 1)

Value

The function returns an object of class "cat_new_class" which is a list containing the following components:

`base_new`	a dataframe after clubbing low percentage classes with another class of similar or closest but higher event rate
`cat_class_new`	a dataframe with mapping between original classes and new clubbed classes (if any)

`cv_table`	dataframe of class cv_table with three columns - var_1, var_2, cv_value
`iv_table`	dataframe of class iv_table with two columns - Variable_name, iv
`threshold`	Cramers' V value above which one of the variable will be recommended to be dropped

`retain_var_list`	list of variables remaining post CV filter
`dropped_var_list`	list of variables that can be dropped based on CV filter
`dropped_var_tab`	CV correlation value for dropped variables as a dataframe
`threshold`	threshold CV value used as input parameter

`base`	input dataframe
`column_name`	column name or array of column names for which Cramer's V is to be calculated

`cv_val_tab`	pairwise Cramer's V value as a dataframe
`single_class_var_index`	array of column index of variables with only one class

`base`	input dataframe
`var_1`	categorical variable name, to be passed as string
`var_2`	categorical variable name, to be passed as string

`desc_model`	ctree class model with one variable
`variable`	numerical variable name which on which decision tree was run, to be passed as string

`base`	input dataframe
`observed_col`	column / field name of the observed event
`predicted_col`	column / field name of the predicted event
`event`	the event class, to be passed as string

`confusion_mat`	confusion matrix as a table
`accuracy`	accuracy measure
`precision`	precision measure
`recall`	recall measure
`sensitivity`	sensitivity measure
`specificity`	specificity measure
`f1_score`	F1 score

`mean_abs_error`	mean absolute error between observed and predicted value
`mean_sq_error`	mean squared error between observed and predicted value
`root_mean_sq_error`	root mean squared error between observed and predicted value

`prediction`	base with the predicted value as a dataframe
`gini_tab`	gini table as a dataframe
`gini_value`	gini coefficient value
`gini_plot`	gini curve plot
`ks_value`	Kolmogorov-Smirnov statistic
`breaks`	break points

`base`	input dataframe
`k`	number of cross validation

`base`	input dataframe
`target`	column / field name for the target variable to be passed as string (must be 0/1 type)
`ntree`	number of trees to be fitted
`depth`	maximum depth of variable interactions
`shrinkage`	learning rate
`min_obs`	minimum size of terminal nodes
`bag_fraction`	fraction of the training set observations randomly selected for next tree
`error`	(optional) error measure as objective function to be minimised, to be chosen among "mae", "mse" and "rmse" (default value is "rmse")
`cv`	(optional) k vakue for k-fold cross validation to be performed (default value is 1 ie. without cross validation)

`error_tab_detailed`	error summary for each cross validation sample of the parameter combinations iterated during grid search as a dataframe
`error_tab_summary`	error summary for each combination of parameters as a dataframe
`best_ntree`	ntree parameter of the optimal solution
`best_depth`	depth parameter of the optimal solution
`best_shrinkage`	shrinkage parameter of the optimal solution
`best_min_obs`	cost min_obs of the optimal solution
`best_bag_fraction`	bag_fraction parameter of the optimal solution
`runtime`	runtime of the entire process

`base`	input dataframe
`target`	column / field name for the target variable to be passed as string (must be 0/1 type)
`model_type`	to be chosen among "regression" or "classification"
`ntree`	number of trees to be fitted
`mtry`	number of variable to be sampled as split criteria at each node
`maxnodes`	(optional) Maximum number of terminal nodes (default is NULL ie. no restriction on depth of the trees)
`nodesize`	minimum size of terminal nodes
`error`	(optional) error measure as objective function to be minimised, to be chosen among "mae", "mse" and "rmse" (default value is "rmse")
`cv`	(optional) k vakue for k-fold cross validation to be performed (default value is 1 ie. without cross validation)

`retain_var_tab`	variables remaining post IV filter as a dataframe
`retain_var_name`	array of column names of variables to be retained
`dropped_var_tab`	variables that can be dropped based on IV filter as a dataframe
`threshold`	threshold IV value used as input parameter

`num_woe_table`	numerical woe table with IV as a dataframe
`cat_woe_table`	categorical woe table with IV as a dataframe
`woe_table`	numerical and categorical woe table with IV as a dataframe
`iv_table`	Variable with IV value as a dataframe

`base`	a dataframe after imputing missing values
`mapping_table`	a dataframe with mapping between original variable and imputed missing value (if any)

`base`	a dataframe after converting all low percentage classes into "Low_pop_perc" class
`mapping_table`	a dataframe with mapping between original classes which are now "Low_pop_perc" class (if any)

`base`	input dataframe
`train_perc`	(optional) percentage of total base to be kept as training sample, to be provided as decimal/fraction (default percentage is 0.7)
`seed`	(optional) seed value (if not given random seed is generated)
`replace`	(optional) whether replacement will e with or without replacement (default is FALSE ie. without replacement)

`train_sample`	training sample as a dataframe
`test_sample`	test sample as a dataframe
`seed`	seed used

`base`	base input dataframe
`target`	column / field name for the target variable to be passed as string (must be 0/1 type)
`model`	input logistic model from which the coefficients are to be picked
`point`	(optional) points after which the log odds will get multiplied by "factor" (default value is 15)
`factor`	(optional) factor by which the log odds must get multiplied after a step of "points" (default value is 2)
`setscore`	(optional) input for setting offset (default value is 660)

`base`	input dataframe with classes same as scalling logic
`target`	column / field name for the target variable to be passed as string (must be 0/1 type)
`scalling`	dataframe of class scalling with atleast two columns - Variable, Category, Coefficient, D(i,j)_hat, Score

`univar_table`	univariate summary of variables
`num_var_name`	array of column names of numerical type variables
`char_var_name`	array of column names of categorical type variables
`sparse_var_name`	array of column names where population concentration at a class or value is more then the sparsity threshold

`base`	input dataframe with set of final variables only along with target
`target`	column / field name for the target variable to be passed as string (must be 0/1 type)
`threshold`	threshold value for vif (default value is 2)

Package 'scorecardModelUtils'

Help Index

Clubbing class of categorical variables with low population percentage with another class of similar event rate

Description

Usage

Arguments

Value

Author(s)

Examples

IV table for individual categorical variable

Description

Usage

Arguments

Value

Author(s)

Examples

Clubbing class of a categorical variable with low population percentage with another class of similar event rate

Description

Usage

Arguments

Value

Author(s)

Examples

Variable reduction based on Cramer's V filter

Description

Usage

Arguments

Value

Author(s)

Examples

Pairwise Cramer's V among a list of categorical variables

Description

Usage

Arguments

Value

Author(s)

Examples

Cramer's V value between two categorical variables

Description

Usage

Arguments

Value

Author(s)

Examples

Getting the split value for terminal nodes from decision tree

Description

Usage

Arguments

Value

Author(s)

Examples

Recursive Decision Tree partitioning with monotonic event rate along with IV table for individual numerical variable

Description

Usage

Arguments

Value

Author(s)

Examples

Creates confusion matrix and its related measures

Description

Usage

Arguments

Value

Author(s)

Examples

Creates random index for k-fold cross validation

Description

Usage

Arguments

Value

Author(s)

Examples

Computes error measures between observed and predicted values

Description

Usage

Arguments

Value

Author(s)

Examples

Calculating mode value of a vector

`vif_table`	vif table post vif filtering
`model`	the model used for vif calculation
`retain_var_list`	variables remaining in the model post vif filter as an array
`dropped_var_list`	variables dropped from the model in vif filter step
`threshold`	threshold