I have long had a problem with the insistence by writers of software that version & copyright information from the software be visible. The issue first got my attention a few years ago when a Simple Machines Forum exploit was created that used Google to identify targets. A particular version of SMF was vulnerable, and because SMF displays a consistent copyright and version number, the malware author was able to create a simple Google query that could direct him to essentially every vulnerable SMF forum on the Internet. Fortunately, my forum wasn’t impacted, but it wasn’t because my site was not found – I run the Suhosin security patch for php which blocked the attack.
Some would say this is security through obscurity, and to an extent, it is. It’s hard to find a real-world parallel to this, but I’ll try:
Imagine your house has a certain brand of locks on all the doors. Someone finds a weakness in the lock that lets someone open the locked door extremely easily. That’s the vulnerable web site. Now, imagine the lock manufacturer keeps a list of all the houses in the world that use the problematic lock, and on which doors the lock is installed. Now think about leaving copies of that list all over the place. That seems pretty crazy, but in Internet terms, trying to keep that list private would be called “security through obscurity”. Indeed, the Internet example is much more aggressive, since location is irrelevant and the break-in’s can be scripted to happen at a rate of hundreds or thousands per minute.
I am not proposing that the copyrights & version numbers be removed as standard practice as an alternative to other security measures – to do so would be very foolish. This simply reduces or eliminates the automation and mass exploit capabilities that attackers currently have.
So, what got me thinking about that was this log I saw today for another site I administer, syslog.org:
71.52.247.235 – - [26/Mar/2010:10:07:54 -0400] “GET /forum/web-server-logs/continuing-attack-attempts-against-smf/ HTTP/1.1″ 200 32542 “http://www.google.com/search?q=%22SMF+%C2%A9+2006-2009%2C+Simple+Machines+LLC%22&ie=utf-8&oe=utf-8&aq=t&client=firefox-a&rlz=1R1GGGL_en___US363″ “Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2 GTB6 (.NET CLR 3.5.30729)”
If you paste the referrer into your web browser, you can see that the visitor came from a Google search for the term: “SMF © 2006-2009, Simple Machines LLC”. In this case, the “visitor” was not out to attack the site. He was simply a forum spammer who creates an account and decorates that account with links to sites he is promoting. It’s pretty clever, actually, since good forum mods will react to spammy forum posts, resulting in the post and the account being deleted. But, simply creating the account generally doesn’t set off an alarm, so the account quietly sits there with it’s payload of links waiting to be indexed by search engines.
This is yet another reason to stop the madness of including the standard copyright for such software on a web site. In this case, the spammer specifically targeted Simple Machines forums. He most likely started at the top of the list registering accounts, working his way down, and may even have a script engine that does most of the work for him. This is mostly a nuisance, but it does indicate that there are not many upsides, and a lot of downsides to letting search engines create nice, neat lists of which sites are running which software.
Some software is militant about the copyright being displayed, others are more relaxed. Generally, even the most strict projects will allow removing the version number. It is prudent to remove as much information as the license will permit, to make your site less of a target.