Professional software developers have a love-hate relationship with static code analyzers. These tools burst onto the scene in the 2000s, offering what sounded almost too good to be true: the ability to automatically scan programs and locate software flaws that would otherwise be impossible or far more expensive to discover through traditional human code reviews. These tools could reduce cost while improving quality, safety, and security. Fifteen years later, static code analyzers are a disappointment to many, a technology that has not lived up to its promise. One of the main reasons: too much noise.
This post was originally published on Inside BlackBerry Blog.
Static Code Analyzers: The Noisy Car Alarms of Software Development
When car alarms first came to market, they were designed to deter and reduce car theft by making loud noises and alerting those in the immediate area that something nefarious was going on. Unfortunately, many car alarms are too sensitive and trigger by accident when anyone touches the car or walks too close to it. As a result, we hear car alarms, get annoyed, and wait for them to shut off rather than calling the police. Most of the time we’re right to do so; any security technology that consistently provides false alarms is eventually ignored and shunned by its users.
In static analysis, these types of false alarms are known as “false positives,” where the analyzer incorrectly identifies good code as flawed and finds flaws that don’t make sense to fix. Examples of these types of flaws include a potential null pointer dereference that the developer knows in practice will never occur or a flaw that is so sufficiently minor (e.g. not exploitable) that modifying the code to address the flaw introduces more risk than doing nothing.
Our R&D teams have used a wide range of open source and commercial static analyzers over the years, and they all generate a massive rate of noise. Numerous development teams and academic researchers have corroborated these experiences, including a recent study of developer sentiment at Microsoft. Developers spend incredible effort “analyzing the analyzer” to decide which of the reported flaws make good sense to address. This expensive process causes the tool to be used sparingly or not at all. Noisy analyzers defeat their original purpose of generating higher quality code more efficiently. What a shame.
How We’re Fixing Code Analysis to Quiet False Alarms
BlackBerry CHACE has been attacking this important problem from a couple of angles. First of all, we’re working on newer techniques based on formal methods that are more precise than traditional static code analyzers. But just as importantly, we’re also expanding the purview of automated analysis to include aspects of software development beyond the code itself. Developers who use static analyzers know that they raise so many false alarms because the tools don’t understand the context of the code and how it’s meant to be used.
Imagine you saw someone trying to get into a locked car – how would you know if they were trying to steal it? Most of us would instinctively look for clues in the person’s behavior and environment. Do they seem calm, nervous, or annoyed? Are they looking around for help or to make sure no one is watching? Are you on a busy street or in a dimly lit alley? Is it dark outside or broad daylight? These are all contextual clues that can help you decipher the situation and avoid falsely accusing someone who might simply have locked their keys in their car.
Similarly, contextual clues provide insight into how code is meant to be used and avoid false positives. CHACE has partnered with the University of Waterloo and other top institutions to apply machine learning to this problem by leveraging the enormous amount of contextual information available during the software development cycle in addition to the analyzer output. For example, developers have a configuration management system that records code churn (rate of change), overall maturity of the affected software, and who “owns” the code and makes most changes in the affected area. Developers also have easy access to additional software metrics, such as code complexity and lines of code.
Our cooperative research focuses on determining how these additional factors can reduce the number of false alarms. Once this process is optimized, machine learning can automatically tune the analyzer output to a set of reports that dramatically reduces false positives with a minimized loss of true positives. With a far higher rate of actionability, static analysis can deliver on its promise of improved quality and security at reduced development cost and time to market.
Machine Learning Cuts False Alarms by 75%
The machine-learning tool we’ve been developing is showing promising results. The tool reduced false positives by 75% on a range of mature open source and commercial software projects. Without the tool, only 18% of generated alarms were actionable; with the tool, 93% of the alarms represented issues that needed to be fixed. This accuracy improvement represents a big step forward for the practical deployment of static analysis tools.