Finance and Credit

Abstracting and Indexing

Referativny Zhurnal VINITI RAS
LCCN Permalink
Google Scholar

Online available



Cyberleninka (24 month OA embargo)

Binning of variables: A compromise between the efficiency of the model and regulation

Vol. 25, Iss. 9, SEPTEMBER 2019

Received: 5 June 2019

Received in revised form: 19 June 2019

Accepted: 3 July 2019

Available online: 30 September 2019

Subject Heading: Banking

JEL Classification: G21, G28

Pages: 2040–2053

Roskoshenko V.V. Lomonosov Moscow State University (MSU), Moscow, Russian Federation

ORCID id: not available

Subject The research discretizes default factors of claims to loan repayment, as an aspect of statistical modeling in banking.
Objectives The research identifies multiple valid discretization algorithms for credit scoring and choose the most appropriate one. It also demonstrates that discretization is necessary for building a predictive model given the logistic regression method is applied.
Methods For purposes of research, I conducted the statistical analysis, content analysis of sources.
Results As part of statistical analysis, the proposed algorithm (TreeR) was proved to be most appropriate among algorithms that are compliant with Basel II requirements and existing criteria. TreeR splits the continuous variable as a result of the algorithm raising decision trees for a binary dependent variable. The algorithm is a brand new solution to the discretization of the continuous variable. What distinguishes TreeR is that it sits on the open access software and relies upon publicly available libraries.
Conclusions and Relevance The findings can be used for credit scoring as well as for statistical modeling based on the logistic regression.

Keywords: credit scoring, logistic regression, discretization, data preprocessing, continuous variable


  1. Tomczak J.M., Zięba M. Classification Restricted Boltzmann Machine for Comprehensible Credit Scoring Model. Expert Systems with Applications, 2015, vol. 42, iss. 4, pp. 1789–1796. URL: Link
  2. Guégan D., Hassani B. Regulatory Learning: How to Supervise Machine Learning Models? An Application to Credit Scoring. The Journal of Finance and Data Science, 2018, vol. 4, iss. 3, pp. 157–171. URL: Link
  3. Xia Y., Liu C., Da B., Xie F. A Novel Heterogeneous Ensemble Credit Scoring Model Based on Bstacking Approach. Expert Systems with Applications, 2018, vol. 93, pp. 182–199. URL: Link
  4. Florez-Lopez R., Ramon-Jeronimo J.M. Enhancing Accuracy and Interpretability of Ensemble Strategies in Credit Risk Assessment. A Correlated-Adjusted Decision Forest Proposal. Expert Systems with Applications, 2015, vol. 42, iss. 13, pp. 5737–5753. URL: Link
  5. Salem D. Supervised Versus Unsupervised Discretization for Improving Network Intrusion Detection. International Journal of Computer Science and Information Security (IJCSIS), 2016, vol. 14, iss. 10, pp. 583–590.
  6. García S., Luengo J., Saéz J.A. et al. A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning. IEEE Transactions on Knowledge and Data Engineering, 2013, vol. 25, no. 4, pp. 734–750. URL: Link
  7. Kotsiantis S.B., Kanellopoulos D. Discretization Techniques: A Recent Survey. GESTS International Transactions on Computer Science and Engineering, 2006, vol. 32, iss. 1, pp. 47–58.
  8. Kohavi R., Sahami M. Error-Based and Entropy-Based Discretization of Continuous Features. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96). Portland, AAAI Press, 1996, pp. 114–119. URL: Link
  9. Boulle M. Khiops: A Statistical Discretization Method of Continuous Attributes. Machine Learning, 2004, vol. 55, iss. 1, pp. 53–69. URL: Link
  10. Fayyad U.M., Irani K.B. Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. International Joint Conferences on Artificial Intelligence. AAAI Press, 1993, vol. 2, pp. 1022–1027.
  11. Zighed D.A., Rabaséda S., Rakotomalala R. FUSINTER: A Method for Discretization of Continuous Attributes. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 1998, vol. 6, no. 3, pp. 307–326. URL: Link
  12. Kerber R. ChiMerge: Discretization of Numeric Attributes. Proceedings of the Tenth National Conference on Artificial Intelligence. San Jose, California, AAAI Press, 1992, pp. 123–128. URL: Link
  13. Kurgan L.A., Cios K.J. CAIM Discretization Algorithm. IEEE Transactions on Knowledge and Data Engineering, 2004, vol. 16, iss. 2, pp. 145–153. URL: Link
  14. Tay F.E.H., Shen L. A Modified Chi2 Algorithm for Discretization. IEEE Transactions on Knowledge and Data Engineering, 2002, vol. 14, iss. 3, pp. 666–670. URL: Link
  15. Hothorn T., Hornik K., Zeileis A. Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics, 2006, vol. 15, iss. 3, pp. 651–674. URL: Link
  16. Yu Sang, Heng Qi, Keqiu Li et al. An Effective Discretization Method for Disposing High-Dimensional Data. Information Sciences, 2014, vol. 270, pp. 73–91. URL: Link
  17. Tsai C.J., Lee C.I., Yang W.P. A Discretization Algorithm Based on Class-Attribute Contingency Coefficient. Information Sciences, 2008, vol. 178, iss. 3, pp. 714–731. URL: Link
  18. Gonzalez-Abril L., Cuberos F.J., Velasco F., Ortega J.A. Ameva: An Autonomous Discretization Algorithm. Expert Systems with Applications, 2009, vol. 36, iss. 3, part 1, pp. 5327–5332. URL: Link

View all articles of issue


ISSN 2311-8709 (Online)
ISSN 2071-4688 (Print)

Journal current issue

Vol. 26, Iss. 5
May 2020