Assessing and Improving Prediction and Classification: Theory and Algorithms in C++
Book Details
- Author: Timothy Masters
- Categories: Machine Learning, Algorithmic Trading, Statistical Analysis
Quick Summary
Timothy Masters provides a rigorous treatment of prediction and classification assessment methodologies, covering performance measures, cross-validation, bootstrap resampling, ROC curves, and advanced algorithms for improving model accuracy, with complete C++ implementations applicable to trading system development.
Detailed Summary
"Assessing and Improving Prediction and Classification" by Timothy Masters, published by Apress in 2018, is a technically demanding work that addresses the critical but often neglected question of how to properly evaluate and improve predictive models. While applicable broadly, the methodologies are directly relevant to quantitative trading where prediction quality determines profitability.
Chapter 1 covers the assessment of numeric predictions, establishing the notation and framework for the entire book. Masters presents an overview of performance measures, addresses the crucial concepts of consistency and evolutionary stability, and tackles selection bias and the need for three datasets (training, validation, and test). Cross-validation and walk-forward testing receive thorough treatment, including bias in cross-validation, overlap considerations, nonstationarity assessment, and nested cross-validation. Common performance measures (MSE, MAE, R-squared, RMSE, nonparametric correlation, success ratios) are presented alongside alternatives and methods for stratification to assess consistency. Confidence interval construction is covered in depth, including serial correlation, multiplicative data, and empirical quantiles.
Chapter 2 addresses class prediction assessment through confusion matrices, expected gain/loss analysis, and ROC (Receiver Operating Characteristic) curves. The treatment of ROC methodology is exceptionally thorough, covering hits, false alarms, computation methods, area under the curve, cost considerations, threshold optimization, precision maximization, and generalized targets. Bayesian methods and hypothesis testing for classification are compared in depth.
Chapter 3 covers resampling methods for parameter estimation, including bootstrap estimation of bias and variance, plug-in estimators, confidence intervals (including the percentile method and its improvements), hypothesis tests, and bootstrapping of ratio statistics and dependent data. The jackknife method is covered as an alternative approach.
Throughout, Masters provides complete C++ implementations, making the book simultaneously a theoretical treatment and a practical programming guide. The emphasis on proper statistical methodology -- particularly avoiding overfitting, managing selection bias, and constructing valid confidence intervals -- makes it essential reading for quantitative traders and system developers who need to distinguish genuine predictive ability from statistical artifacts.