ISSN: 2056-3736 (Online Version) | 2056-3728 (Print Version)

Machine Learning Risk Models

Zura Kakushadze and Willie Yu

Correspondence: Zura Kakushadze ,

Quantigic Solutions LLC, USA

pdf (626.85 Kb) | doi:


We give an explicit algorithm and source code for constructing risk models based on machine learning techniques. The resultant covariance matrices are not factor models. Based on empirical backtests, we compare the performance of these machine learning risk models to other constructions, including statistical risk models, risk models based on fundamental industry classifications, and also those utilizing multilevel clustering based industry classifications.


  machine learning; risk model; clustering; k-means; statistical risk models; covariance; correlation; variance; cluster number; risk factor; optimization; regression; mean-reversion; factor loadings; principal component; industry classification; quant; trading; dollar-neutral; alpha; signal; backtest


Campbell, L.L. (1960) Minimum coefficient rate for stationary random processes. Information and Control 3(4): 360-371.

Forgy, E.W. (1965) Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21(3): 768-769.

Grinold, R.C. and Kahn, R.N. (2000) Active Portfolio Management. New York, NY: McGraw-Hill.

Hartigan, J.A. (1975) Clustering Algorithms. New York, NY: John Wiley & Sons, Inc.

Hartigan, J.A. and Wong, M.A. (1979) Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society, Series C (Applied Statistics) 28(1): 100-108.

Kakushadze, Z. (2015a) Mean-Reversion and Optimization. Journal of Asset Management 16(1): 14-40.Available online:

Kakushadze, Z. (2015b) Russian-Doll Risk Models. Journal of Asset Management 16(3): 170-185. Available online:

Kakushadze, Z. (2015c) Heterotic Risk Models. Wilmott Magazine 2015(80): 40-55. Available online:

Kakushadze, Z. (2016) Shrinkage = Factor Model. Journal of Asset Management 17(2): 69-72. Available online:

Kakushadze, Z. and Yu, W. (2016a) Multifactor Risk Models and Heterotic CAPM. Journal of Investment Strategies 5(4): 1-49. Available online:

Kakushadze, Z. and Yu, W. (2016b) Statistical Industry Classification. Journal of Risk & Control 3(1): 17-65.Available online:

Kakushadze, Z. and Yu, W. (2017a) Statistical Risk Models. Journal of Investment Strategies 6(2): 11-40.Available online:

Ledoit, O. and Wolf, M. (2004) Honey, I Shrunk the Sample Covariance Matrix. Journal of Portfolio Management 30(4): 110-119.

Lloyd, S.P. (1957) Least square quantization in PCM. Working Paper. Bell Telephone Laboratories, Murray Hill, NJ.

Lloyd, S.P. (1982) Least square quantization in PCM. IEEE Transactions on Information Theory 28(2): 129-137.

MacQueen, J.B. (1967) Some Methods for classification and Analysis of Multivariate Observations. In: LeCam, L. and Neyman, J. (eds.) Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, CA: University of California Press, pp. 281-297.

Markowitz, H.M. (1952) Portfolio Selection. Journal of Finance 7(1): 77-91.

Rebonato, R. and J¨ackel, P. (1999) The Most General Methodology to Create a Valid Correlation Matrix for Risk Management and Option Pricing Purposes. Journal of Risk 2(2): 17-28.

Roy, O. and Vetterli, M. (2007) The effective rank: A measure of effective dimensionality. In: European Signal Processing Conference (EUSIPCO). Pozna´n, Poland (September 3-7, 2007), pp. 606-610.

Sharpe, W.F. (1994) The Sharpe Ratio. Journal of Portfolio Management 21(1): 49-58.

Steinhaus, H. (1957) Sur la division des corps mat´eriels en parties. Bull. Acad. Polon. Sci. 4(12): 801-804.

Yang, W., Gibson, J.D. and He, T. (2005) Coefficient rate and lossy source coding. IEEE Transactions on Information Theory 51(1): 381-386.