White Paper
Improving Predictive Code Quality Using Machine Learning
This paper argues that software quality is a competitive advantage and that machine learning and deep learning can improve code health by predicting defects and quality issues earlier, reducing rework and maintenance (it cites that high-quality software can prevent repairs and reworks by more than 70%, and notes an estimated $312B annual cost of source code defect correction) (pages 1–2). It explains two core approaches: a code analysis workflow that converts code into numerical representations using models like ASTs or token sequences, then extracts token, path, or graph features and trains models such as RNNs, CNNs, GNNs, DNNs, or SVMs (page 4); and a defect prediction workflow that labels data (often from PROMISE or CI datasets), uses code metrics like LOC and cyclomatic complexity or l
