Building a Web Vulnerability Scanner
In May this year we launched SecAlerts, a free security product that sends subscribers a customised weekly report of vulnerabilities and security news relevant to their software stack. We deliberately made the service a low-barrier of entry way to keep users informed and, as we near 1,000 subscribers, our decision to 'keep it simple' appears to have merit.
We built the Security Audit Tool using an instance of Wappalyzer to detect the software from a given URL. This works by scraping the website and looking for little clues that give away the software used. For instance a WordPress blog will likely have
wp-content in the HTML, Nginx will mostly respond with a
Server: nginx response header. With these clues we build up a list of software that is likely used to power the website.
Here is an example of the software detected when looking at
webpackJsonp. To detect Varnish it looks for specific response headers and in this case it finds
Once we have a collection of software detected from a URL, we convert them all to what's known as a CPE. A CPE (Common Platform Enumeration) is a structured naming scheme used by public vulnerabilities to list the affected software and versions. The CPE for Varnish looks like
cpe:2.3:a:varnish-cache:varnish:*:*:*:*:*:*:*:*. With a lookup table we generate the CPE and search our vulnerability database to find vulnerabilities in the last six months that match.
This method of vulnerability scanning, while simple for the user, has its downsides. The detected software may be incorrect, some software might be impossible to detect and rarely is it able to accurately detect versions. Let's go over the pros and cons of different vulnerability scanners...
Remote scanning has very limited access to the system. It starts with an IP address or URL and has to find as much as it can from what the system or network reveals. Remote scanning is generally limited to remote attacks and other forms of remote detection like our Security Audit Tool. Other remote scanners can attempt to detect the software then run a set of benign attacks from public exploit databases. This is generally not very useful unless your infrastructure has severely outdated software because public exploits are not frequently released or up-to-date and many exploits are not remotely exploitable (only 15% of exploits are remotely exploitable on Exploit-DB).
Some remote scanners will attempt an automated pentest by going over some basic heuristic checks such as SQL injection attempts in input fields, XSS by entering scripts into input, looking for hidden URLs in robots.txt, validating HTTP security headers, guessing common subdomains and paths (/admin, /wp-admin). Depending on the capabilities of the scanner this can be worth the effort though generally a scan isn't necessary on a frequent basis as it will be running largely the same tests over and over again. A manual pentester would try different paths and techniques.
Local scanning has a much better chance of finding vulnerabilities as the scanner is installed on the system and can literally look through the file system to find the installed and running software. A great middle ground is known as "agentless scanning" where a scanner does not need to be installed on the target machine, it simply uses an SSH connection to gather the information.
While scanners can be useful in case a major public exploit affects you or some unknown services are open on a port, the best defense is to keep software up to date and stay alert to new vulnerabilities that could affect your stack. If you want a simple way to get alerts you can signup to SecAlerts for weekly vulnerabilities and security news completely customised to your stack.