Результаты поиска 1 - 2 из 2
Начало | Пред. | 1 | След. | Конец

Overcoming the class imbalance in modeling the credit default

Roskoshenko V.Vl. Lomonosov Moscow State University (MSU), Moscow, Russian Federation ( )

Journal: Finance and Credit, #11, 2019

Subject The banking sector faces the class imbalance of samples in modeling the credit default. Data pre-processing is traditionally the first option to choose in bank modeling, since it helps overcome the class imbalance. Available studies into such approaches and their comparison discuss a few methods or focus on very specific data. Moreover, previous researchers overlook approaches combining data pre-processing and ensemble-based solutions (stacking).
Objectives The study aims to find the best-fit option to overcome the class imbalance of each group of approaches applied to bank data on retail lending.
Methods The study employs mathematical modeling, statistical analysis and content analysis of sources.
Results Although being rather mathematically difficult, EditedNearestNeighbours approach proved to be most convenient for pre-processing of data. It excludes representatives of the dominant class, which are inadequate to the surrounding environment which is determined through clustering. RandomOverSampler also turned to meet expectations among combinations of data pre-processing and stacking approaches. It increases a percentage of the minority class randomly and appears to be most simple.
Conclusions and Relevance The article presents an exhaustive comparison of approaches to the class imbalance in samples. I selected the most appropriate approach from data pre-processing approaches and the best combination of data pre-processing and ensemble-based solution. The findings can be used for purposes of credit scoring and statistical modeling, when binary classification is required.

Binning of variables: A compromise between the efficiency of the model and regulation

Roskoshenko V.V. Lomonosov Moscow State University (MSU), Moscow, Russian Federation ( )

Journal: Finance and Credit, #9, 2019

Subject The research discretizes default factors of claims to loan repayment, as an aspect of statistical modeling in banking.
Objectives The research identifies multiple valid discretization algorithms for credit scoring and choose the most appropriate one. It also demonstrates that discretization is necessary for building a predictive model given the logistic regression method is applied.
Methods For purposes of research, I conducted the statistical analysis, content analysis of sources.
Results As part of statistical analysis, the proposed algorithm (TreeR) was proved to be most appropriate among algorithms that are compliant with Basel II requirements and existing criteria. TreeR splits the continuous variable as a result of the algorithm raising decision trees for a binary dependent variable. The algorithm is a brand new solution to the discretization of the continuous variable. What distinguishes TreeR is that it sits on the open access software and relies upon publicly available libraries.
Conclusions and Relevance The findings can be used for credit scoring as well as for statistical modeling based on the logistic regression.

Результаты поиска 1 - 2 из 2
Начало | Пред. | 1 | След. | Конец

Сортировать по релевантности | Отсортировано по дате