An optimization problem has to be solved by adjusting the threshold and seeking the optimum in order to balance the trade-off between the decrease in revenue and a decrease in cost.

If “Settled” is described as good and “Past Due” is understood to be negative, then using the design of this confusion matrix plotted in Figure 6, the four areas are split as real Positive (TN), False Positive (FP), False Negative (FN) and real Negative (TN). Aligned with all the confusion matrices plotted in Figure 5, TP could be the good loans hit, and FP could be the defaults missed. Our company is interested in those two areas. To normalize the values, two widely used mathematical terms are defined: real good Rate (TPR) and False Positive Rate (FPR). Their equations are shown below:

In this application, TPR could be the hit price of great loans, plus it represents the capacity of earning funds from loan interest; FPR is the lacking rate of standard, also it represents the likelihood of taking a loss.

Receiver Operational Characteristic (ROC) bend is considered the most widely used plot to visualize the performance of the category model after all thresholds. In Figure 7 left, the ROC Curve for the Random Forest model is plotted. This plot basically shows the connection between TPR and FPR, where one always goes into the direction that is same one other, from 0 to at least one. a great category model would usually have the ROC curve over the red standard, sitting because of the “random classifier”. The region Under Curve (AUC) can also be a metric for evaluating the category model besides precision. The AUC of this Random Forest model is 0.82 away from 1, which can be decent.

Although the ROC Curve obviously shows the partnership between TPR and FPR, the limit is an implicit adjustable. Continue reading An optimization problem has to be solved by adjusting the threshold and seeking the optimum in order to balance the trade-off between the decrease in revenue and a decrease in cost.