Credit Risk and Machine Learning Concepts -6

Published in

Analytics Vidhya

12 min readFeb 12, 2020

What is the machine learning component in this area ?

A standard approach to designing an AI/ML solution often includes the creation of a decision tree, which maps out as a result of a Design Thinking Behavior Driven Development the functions and sequence of actions. This is not at the level of a sequence diagram but more the mapping of the process and decision flows that otherwise a human would do, but the rules are straightforward to code to make an automated process do with additional insight and the ability to consider much more data at once and in near-instantaneous time than a human could do.

The following model presented signifies a giant leap forward in the practice of risk modelling. All the approaches previously presented presuppose that the parameters of the model are independent from the set of observed data D, hence PD = p(x|⊖, D) = P(x|⊖)

Non-parametric models release this assumption, so that ⊖ = f(D). The explanatory power of the model, i.e. the amount of information that ⊖ can capture about the data depends also on the cardinality of D, that is, the more data, the more accurate the model can be.

Decision/classification trees consist in a series of conditional yes/no clauses, based on the vector of covariates, to classify customers in groups. So the effect of the Neural Network is to rapidly based on multiple variables to classify an entity along with other entities that have failed or are unable to meet debts.

The new approach being modeled takes the approach of looking at the Net Operating Working Capital (NOWC) and some additional trends, to base a determination of ability to pay, as well as available payment history, industry trend, economic trends, court actions and verification of a going concern. NOWC is a ratio that measures a company’s ability to pay off all its working liabilities with its operational assets. This is an important metric because it shows the leverage of the company and the amount of current, working assets.

It also shows how a company operates using its resources and how it efficiently the company can adapt to unexpected events and new opportunities. This is evident in equation itself.

The net operating working capital formula is calculated by subtracting working liabilities from working assets like this:

Cash + Accounts Receivable + Inventory — Accounts Payable + Accrued Expenses or Current Operating Assets − Current Operating Liabilities.

This metric is much more tied to cash flows than a net working capital calculation because NWC includes all current assets and current liabilities. Because of this, NOWC is often used to calculate free cash flow. For example.:

Let’s assume Bob’s Transport and supplies has the following assets on its balance sheet:

Cash: $100,000
Accounts Receivable: $20,000
Inventory/Fix: $500,000
Accounts Payable: $300,000
Accrued Expenses: $100,000

Bob would calculate his NOWC as follows:

$100,000 + $20,000 + $500,000 — $300,000 — $100,000 = $220,000

This means that Bob could pay off all his working liabilities with only a portion of his working assets. Thus, if his vendors or creditors called all his debts to be settled at the same time, he would be able to pay them off and still have a good amount of this working assets left to run the business.

Net operating working capital (NOWC) is the excess of operating current assets over operating current liabilities. In most cases it equals cash plus accounts receivable plus inventories minus accounts payable minus accrued expenses.

Operating current assets are assets that are (a) needed to support the business operations, and (b) expected to be converted to cash in next 12 months. They do not include current financial investments.

Operating current liabilities are liabilities that are (a) undertaken to carry out the business operations, and (b) expected to be settled in next 12 months. They exclude any current loans or interest-bearing liabilities.

Net operating working capital is different from (net) working capital which simply equals current assets minus current liabilities. NOWC is an intermediate input in the calculation of free cash flow. Free cash flow equals operating cash flow minus gross investment in operating assets minus investment in net working capital.

Decision Classification

Decision or Classification trees consist of a series of conditional, usually yes/no clauses based on the vector of covariates, which will classify entities into groups. Taking as an example a binary classification tree, such as ‘size’ which includes reported full time employee equivalents and trends, age of the enterprise, total net worth less intangibles and trends, any county judgements including disputes and bankruptcy protection or reorganization filings, trade and press coverage sentiment analysis including key executives and any negative criminal activities and court cases.

As in the diagram deciding on fruit identification, each node divides the set of customers into different subsets until the end node is reached. Customers are subdivided into many classes, to which must be assigned a Probability of Default (PD). This does not assign a simple score to a single customer, but aligns and re-aligns a categorization of ‘likely to default’ based on observed data. Therefore this is not allowing to discriminate between customers within the same category. A minor downside, since the model is not based on any statistical assumption, consist in the fact that it is not possible to assess the stability of a framework with statistical relevance. Robustness is therefore linked with the goodness of the training sample. The model helps to frame potentially complex or nonlinear relationship among the variables: for instance, a covariate might be become relevant only at a certain node of the tree and for a specific subset of customer only. The ability to clearly model interactions among covariates, it is a further strength of decision trees, which unfortunately turn out to be particularly useful if the interaction between variables are somehow known a priori.

Neural Networks

Neural networks are named after the fact that they naively appear to simulate the way the brain works. A more formal definition of neural network is Multilayer Perceptron. A perceptron (see figure below), is the elementary unit of the system constituted by n axons and a node which represent, respectively, n weights and one elementary operation, which takes the inputs and results the output of the operation of the weighted inputs. The value outputs from each node is ”filtered”, via a smoothing function, that rescales the output so that the result does not diverge across multiple layers becoming too heavy for a processing unit.

In fact, they can be described by a series of concatenate matrix multiplications of multiple vectors of covariates x. The output of the model can be referred as the PD, related to the input through a series of intermediate nodes in layers (a concatenate series of perceptrons), who receives in input either the vector x or the output of other nodes and that, in turn, output a value to one or more downstream nodes or to the final output. No preliminary assumption is made on the structure of the intra-network relationships. Indeed, from its initial state, the network can be ”trained” with different samples of defaulted/non-defaulted companies. Basically, the training is an iterative process called backpropagation, through which an algorithm assesses the optimal weight of the nodes connection comparing the outcome of the current state of the network.

A modification of the Net Operating Working Capital (NOWC) approach is suggested. This reduces the Features from up to 24 features that are used by the various Credit Rating Agencies. A collection of initial moving 4 year financials are evaluated for trends. Employee count, revenue, size and years in business would be the initial qualifiers for rating evaluation, otherwise if a threshold is not met, Credit Advisory status is returned after completing the rating, or Credit Challenged if the company is no longer active (in Liquidation, dissolved, bankruptcy). The timeliness factor ‘freshness’ of the financials has a downward Credit Risk Factors effect as they become more stale.

Sector evaluation and general business conditions would be the next evaluation. This is modelled into the Multilayer Perceptron structure for the Neural Network model, similar to the framework below :

The blue lines represent the feed-forwarding flow for propagating the input along the network, while the orange arrows identify the paths back-propagation process.

Some R code used for the basis of such neural network currently in development is shown at the end. Key input from the Supply Chain-based Creditworthiness dissertation as a MSc component by Gabriel Bonomi Boseggia of the Polytechnic of Milan 2017. Some other content has been also sourced from this reference.

This is the sixth installment of my blogs on Credit Risk and Machine Learning. The next installment will consider the anatomy of failing and failed companies, and what the NOWC Neural Net perceptron approach may have indicated, available at this link :

Credit Risk and Machine Learning Concepts -7

The previous 5 installments may be found here:

https://medium.com/@geoff.leigh19/credit-risk-and-machine-learning-concepts-85ef47c978c7?source=friends_link&sk=5249acc679330bd64c76bcae1dc074d1

https://medium.com/@geoff.leigh19/credit-risk-and-machine-learning-concepts-2-fc37e1a05183?sk=94ef606e1c60e2cf1522b9c38a5e144e

https://medium.com/analytics-vidhya/credit-risk-and-machine-learning-concepts-3-d2bb2f39d843

https://medium.com/analytics-vidhya/credit-risk-and-machine-learning-concepts-4-3c44b479a3d1?source=friends_link&sk=cf6fe8b0a96d01c68971f72cbc179229

https://medium.com/analytics-vidhya/credit-risk-and-machine-learning-concepts-5-88f2dc1e18e2?source=friends_link&sk=2a4015bc86ee6071716865356ffb1a0d

Sample R Code fragments:

>> ####> #> # load library> #> ####>> library ( tensorflow )>> ####> #> # Import data> #> ####>> companies <- read.csv (“ companies _ large _ norm.csv “,header =T)> companies <- companies [,c (1 ,2 ,8:19) ]> active <- companies [ which ( companies $ Active == 1) ,]> default <- companies [ which ( companies $ Default == 1) ,]>> ### SAMPLE 5000 active and 5000 default for testing purposes>> smp _act <- sample ( nrow ( active ), size = 5000)> smp _def <- sample ( nrow ( default ), size = 5000)> test <- rbind ( active [smp _act ,], default [smp_def ,])> test <- test [ sample ( nrow ( test )) ,]> train <- rbind ( active [-smp _act , ], default [-smp_def , ])> train <- train [ sample ( nrow ( train )) ,]> train _ active <- active [-smp _act ,]> train _ default <- default [-smp_def ,]> test _ values <- as.matrix ( test [ ,3: ncol ( test )])> test _ labels <- as.matrix ( test [ ,1:2])>> # ####> #> # build the mlayer perceptronxxixxxx APPENDIX I. SCRIPTS> #> # ####>> n_ classes <- 2L> n_ nodes _l1 <- 50L> n_ nodes _l2 <- 25L> keep _ prob = 0.97> x_ length <- as.integer ( ncol ( companies ) -2)> ### Initialize weigths and biases with values> from truncated std. normal>> weight _ variable <- function ( shape ) {+ initial <- tf$ truncated _ normal (shape , stddev = 0.1)+ tf$ Variable ( initial )+ }> bias _ variable <- function ( shape ) {+ initial <- tf$ constant (0.1 , shape = shape )+ tf$ Variable ( initial )+ }> ### define connections within the network>> x <- tf$ placeholder (tf$ float32 , shape (NULL , x_ length ))> hl1 _W <- tf$ Variable (tf$ truncated _ normal ( shape (x_length , n_ nodes _l1),+ stddev + name > hl1 _b <- tf$ Variable (tf$ zeros ( shape (n_ nodes _l1)), name = “B_hl_1”)> hl2 _W <- tf$ Variable (tf$ truncated _ normal ( shape (n_ nodes _l1 , n_ nodes _l2),+ stddev + name > hl2 _b <- tf$ Variable (tf$ zeros ( shape (n_ nodes _l2)), name = “B_hl_2”)> out _W <- tf$ Variable (tf$ truncated _ normal ( shape (n_ nodes _l1 , n_ classes ),+ stddev + name > out _b <- tf$ Variable (tf$ zeros ( shape (n_ classes )), name = “B_ outupt _ layer “)>> ### define activation and droupout function>> l1 <- tf$ add(tf$ matmul (x,hl1 _W),hl1 _b)> l1 <- tf$nn$ relu (l1)> drop1 <- tf$nn$ dropout (l1 , keep _ prob )> l2 <- tf$ add(tf$ matmul (drop1 ,hl2 _W),hl2 _b)> l2 <- tf$nn$ relu (l2)> drop2 <- tf$nn$ dropout (drop1 , keep _ prob )> out <- tf$ add (tf$ matmul (l1 ,out _W),out _b)> y <- tf$nn$ softmax (out )> ### placeholder for output variable>> y_ <- tf$ placeholder (tf$ float32 , shape (NULL , n_ classes ))>> ### error function>> cross _ entropy <- tf$ reduce _ mean (-tf$ reduce _sum (y_ * tf$ log(y),xxxi+ >> ### declaration of the backpropagation> ### optimization algorithm ( ADaptive Moment estimation )>> optimizer <- tf$ train $ AdamOptimizer ()> train _ step <- optimizer $ minimize ( cross _ entropy )>> ### store and reload partially trained model if needed>> # do not run first time>> # loader = tf$ train $ import _ meta _ graph (“ folder “)> # loader $ restore (sess , tf$ train $ latest _ checkpoint (“ folder “))>> # saver $ restore (sess , “ folder “)>> ### do not run when loading>> init <- tf$ global _ variables _ initializer ()> sess <- tf$ Session ()> sess $ run( init )>> ### Define accuracy metrics>> correct _ prediction <- tf$ equal (tf$ argmax (y, 1L), tf$ argmax (y_, 1L))> accuracy <- tf$ reduce _ mean (tf$ cast ( correct _ prediction , tf$ float32 ))>> ### Define summary statistics to be monitor training> ### ( accuracy and and weigths )>> summary <- tf$ summary $ scalar (“ accuracy “, accuracy )> summary _ CrossEntropy <- tf$ summary $ scalar (“ cross entropy “,+ > summary _hl1 _W <- tf$ summary $ histogram (“ weights1 “,hl1_W)> summary _hl2 _W <- tf$ summary $ histogram (“ weights2 “,hl2_W)> summary _ output _ layer _W <- tf$ summary $ histogram (“ weightsOut “,out_W)> summary _w <- c( summary _hl1_W,+ summary _hl2_W,+ summary _ output _ layer _W+ )> summary _ weights <- tf$ summary $ merge ( summary _w)> log _ writer <- tf$ summary $ FileWriter ( paste0 (“ folder “,j))> saver <- tf$ train $ Saver ()>> ####> #> # Train the model> #> ####>> for (i in 1:5000) {xxxii APPENDIX I. SCRIPTS++ ### random selection of the training batch++ train <- rbind ( train _ default ,+ train _ active [ sample ( nrow ( train _ default )) ,])++ ### shuffle++ train <- train [ sample ( nrow ( train )) ,]++ ### separate input (col 1 and 2) from labels ( remaining columns )++ train _ values <- as.matrix ( train [ ,3: ncol ( train )])+ train _ labels <- as.matrix ( train [ ,1:2])++ batch _xs <- train _ values+ batch _ys <- train _ labels++ sess $ run( train _step ,+ feed _ dict = dict (x = batch _xs , y_ = batch _ys))++ ### save accuracy on testing sample each 10 iteration++ if(i %% 10 == 0){+ accuracy = sess $ run( summary ,+ feed _ dict = dict (x = test _ values , y_ = test _ labels ))++ log_ writer $ add _ summary ( accuracy , i)++ error = sess $ run ( summary _ CrossEntropy ,+ feed _ dict = dict (x = test _ values , y_ = test _ labels ))+ log_ writer $ add _ summary (error , i)++ weigths = sess $ run ( summary _ weights )+ log_ writer $ add _ summary (c, global _ step = i)+ }++ ### save entire model each 100 iterations++ if(i %% 100 == 0){+ saver $ save (sess , “ folder “, global _ step =i)++ }+ }>> ####> #> # Show stats> #> ####>> ### print confusion matrix , balanced accuracy and accuracyxxxiii>> prediction = tf$ equal (tf$ argmax (y, 1L), 2L)> pred = sess $ run(tf$ argmin (y, 1L), feed _ dict = dict (x = test _values ,+ > conf.mat <- table ( Predictions =pred , Actual = test _ labels [ ,1])> conf.mat>> bal.accuracy <- ( conf.mat [1 ,1]/sum( conf.mat [ ,1]) ++ conf.mat [2 ,2]/ sum( conf.mat [ ,2]))/2> bal.accuracy>> sess $ run( accuracy , feed _ dict = dict (x = test _values , y_ = test _ labels ))>

Credit Risk and Machine Learning Concepts -6

Written by Geoff Leigh