At Microsoft, 47,000 developers are generating nearly 30,000 bugs a month and these vulnerabilities get stored across over 100 AzureDevOps and GitHub repositories to quickly spot critical bugs and stay ahead of the hackers. According to Scott Christiansen, a senior security programme manager at Microsoft, large volumes of semi-curated data are perfect for machine learning. Since 2001, Microsoft has collected 13 million work items and bugs.
"We used that data to develop a process and machine learning model that correctly distinguishes between security and non-security bugs 99 per cent of the time and accurately identifies the critical, high priority security bugs, 97 per cent of the time," informed Christiansen.
It's a machine learning model that's designed to help developers accurately identify and prioritize critical security issues that need fixing.
"Our goal was to build a machine learning system that classifies bugs as security/non-security and critical/non-critical with a level of accuracy that is as close as possible to that of a security expert," informed the Microsoft executive.
To accomplish this, Microsoft fed its model lots of bugs that are labelled security and others that aren't labelled security. Once the model was trained, it would be able to use what it learned to label data that was not pre-classified. Software developers daily stare down a long list of features and bugs that need to be addressed.
Security professionals try to help by using automated tools to prioritize security bugs, but too often, engineers waste time on false positives or miss a critical security vulnerability that has been misclassified. To tackle this problem, data science and security teams came together to explore how machine learning could help.
"We discovered that by pairing machine learning models with security experts, we can significantly improve the identification and classification of security bugs," Christiansen noted.