[ Agenda | Sessions | Program ]

Credit Risk Assessment using Statistical and Machine Learning Methods as an Ingredient for Risk Modeling of Financial Intermediaries

Jorge Galindo - Harvard University and Pablo Tamayo - Thinking Machines Corp.


Credit risk assessment of financial intermediaries is an area of renewed interest for academics, regulatory authorities, and financial intermediaries themselves. This interest is justified by the recent financial crises in the 80's and the 90's. For example the U.S. S&L's crisis with an estimated cost in the hundreds of billions of dollars, or the fact that from 1989 to 1992 the Nordic countries injected around $16 billion to their financial system in order to keep them away from bankruptcy. Japan's bad loans were estimated to be in the range of $160 to $240 billion in October of 1993. Mexico has spent at least $20 billion trying to keep the financial system from collapsing and the risk has not disappeared yet. Besides these highly publicized cases there are many others of smaller magnitude but where a more accurate estimation of risk, and its use in global financial risk models, could be translated into significant savings or reduction in losses. Our final goal is to make accurate and realistic models of the risk of financial institutions from the perspective of a regulatory authority. One important ingredient to accomplish this goal is to have accurate predictors of individual risk in the credit portfolio and a good methodology to generate them. This will be the main subject of this paper.

We make a comparative analysis of different methods of classification on real credit datasets from the database of a large commercial bank. The motivation is to understand the limitations and potential of different methods and in particular the ones based on machine learning techniques. This is done by a systematic study and comparison with traditional approaches based on statistical classification techniques. In addition, a multi-strategy approach is advocated here where several methods are applied to the same data to find the best and where the methods are sometimes combined in the final analysis. This is justified by the fact that it is very hard to select an optimal model a priory without knowing the actual complexity of a particular problem or dataset.

Past studies comparing different approaches to the classification problem have been rightly criticized because most of them used only one or two techniques and were not done in a systematic and consistent way. This part of our study tries to overcome this problem by analyzing a variety of methods from classical and modern statistics, state-of-the-art machine learning, memory based reasoning and neural networks. There are several other advantages in comparing different methods in the same study: the pre-processed of the data is more homogeneous, there is less bias as the choice of datasets and error estimates is the same. We also analyze the entire process of data preparation, cleaning etc. and the implications of the models' results in the organization of a data warehouse and the data collection process. In addition we analyze the behavior of learning curves of the generalization error versus sample size as a way to obtain rough empirical estimates of complexity.

Another issue of particular importance for financial decision making and risk management addressed in our study is the transparency or degree of interpretability of models. Transparent models are those that can be conceptually understood by the decision maker. An example of a transparent model is a decision tree expressed in term of profiles or rules. Other models such as neural networks can act as very accurate black boxes but at the same time are very opaque in the sense of not providing any clues about the basis for its classification.


Scheduled for Session 2.3 Financial Models - III

[ Agenda | Sessions | Program ]