Spam-Proofing Your Guest Book or Blog

What a shame, that spammers and snake-oil peddlers have not only reduced email into a nearly useless, rat-infested maze, but they've also taken to littering web site blogs and guest books with links to their crap in the hopes of achieving more prominent rankings in Internet search engines.  Here at Bytebrothers, for months I was forced to manually remove hundreds of links left in my web sites' guest books for sex, porn, drugs, gambling and all the other illicit crap and fad mediocrity that spammers peddle.  This isn't just annoying, it's even more time consuming than getting rid of spam, and left unchecked it would reduce the utility of the guest books into useless noise.

FUCK THOSE BASTARDS!

I DID!

As a web site author or operator your choices are either to moderate your blogs and guest books, where every post requires your approval, or require registration before allowing posts, or try some automated way of repelling the spammer's link-loading attempts.  Since the first option involves a huge amount of ongoing maintenance and the second makes life difficult for honest visitors, I'm a big fan of the automated approach.

With the automated approach, we need to discern whether or not the spammer really is a "guest", and we need to check the content being posted.  Then we can easily repel the attempts to post the spam, and optionally we can implement mechanisms to punish the perpetrators.  It's remarkably easy to implement a strong defense without discouraging real visitors.

Guest or not, here I come...

As you navigate a web site, each time you click on a hyperlink the web browser sends information to the web server about which page you came from.  This is known as the "HTTP referrer".  Since a true guest to your web site would most certainly have clicked on another page on your site to reach the guest book, the guest book page should see the referrer value as an identifiable page from your web site.  If it's empty or its value is some unrecognizable web site or page, we know we've got a spammer in our clutches.  We can then simply reject the post or we can go so far as to entertain ourselves by abusing the spammer.

You can check for the referrer in two places for the best defense.  First check from the page to which you POST the submission (the target of your FORM).  The second place to check is the page that contains the FORM (as long as it's not your home page).  We must check to make sure the visitor arrived on that page from one of your own and not an outside source.

The first angle is the easiest.  In ASP you can see the URL of the referring page with this code:

<% strPageURL = Request.ServerVariables("HTTP_REFERER") %>

Then your code can check strPageURL to see if it matches the URL of the page containing your form.

The second angle takes a little more work.  In JavaScript the referrer is available in the document.referrer property.  Don't be misled by the different spelling - these are not typographical errors.  What we need to do is pass the referrer as a value hidden in your form, by including this HTML/Javascript code before the applicable </FORM> tag in the web page containing your form:

<script language="JavaScript"><!--
	document.write('<input type="hidden" name="Referrer" value="');
	document.write(document.referrer);
	document.write('">');
// --></script>

Sure, we could have checked the referrer immediately on the form page, but this is more entertaining.  By passing along the value to the target of your form instead of using it right away, you can delay your handling of the spammer and make him waste his or her time and effort filling out the form.

You can grab this passed-along value of the form page referrer in ASP when the form is posted by using this ASP code in the form target page:

<% strReferrer = Request("Referrer") %>

This way, the page that processes your form can see whether or not the form's page had a valid referrer.

If you wished to be very sophisticated, you can use a cookie to track how many pages on your site have been read by the visitor, and choose a minimum number of page hits to consider one of your visitors to be truly a guest.  I haven't seen the need, yet.

Here a link, there a link...

This third angle on spam prevention is a little easier.  If the post consists of many instances of "http" and HTML tags, i.e. "<A HREF=...>", etc., and/or you find those strings in fields meant for names or other text, then we know we've got a spammer.  You can use regular expressions to easily count the occurrences.  Here's a function in ASP that takes a regular expression and the character string to be checked, and returns the number of matches.  Included is a sample expression:

<% Function RegExpCount(strPattern, strString)
	Dim regEx, Match, Matches
	Set regEx = New RegExp
	regEx.Pattern = strPattern
	regEx.IgnoreCase = True
	regEx.Global = True
	Set Matches = regEx.Execute(strString)
	RegExpCount = Matches.Count
End Function
intIllegalTags = RegExpCount("<a href|<[a-z]>|</[a-z]>|<br.?>|http", strGuestComments) %>
In this example, strGuestComments contains the text of the attempted guest book post.  If you decide that someone is permitted to post just one link in your guest book, just check to see if intIllegalTags is greater than 1, otherwise just check for it being greater than 0.

Now that we've got 'em...

Once we we've caught the spammer committing a crime, we need to choose a punishment.  We can simply throw an error message and keep the post from being entered, or we can issue threats, or we can just waste the spammer's time.  Since the spammers like to waste our time, I enjoy the idea of wasting theirs.  A critical component in this scheme, since my site is ASP -based, is this:

http://authors.aspalliance.com/stevesmith/articles/sleeptimer.asp

Steve Smith's SleepTimer is a DLL which, once registered on your Microsoft IIS -based web server, allows you to insert processing delays into your web pages.  When I detect spammer activity, I use delays of a minute or more before rejecting their submission.  The more links they attempt to post, the longer it delays a response.

Now the big question is, do you tell them why their post is being rejected after they've waited?  Or do you even tell them it's been rejected at all?  I'm not sure which is more fun - telling them to piss off or letting them waste more time trying to figure out if things are working.

Here's a tip - if you're going to keep your spammers waiting, you should make sure they don't get bored or frustrated too quickly.  So instead of letting them stare at a blank page while your delay is going, you should give them something to chew on.  That should keep them from hitting the [Back] or [Stop] buttons prematurely.  This example displays a creeping line of asterisks as the seconds go by:

<% Sub Delay(seconds)
	response.write "<p>Working - please be patient"
	For x = 1 to seconds
		Call objTimer.DoSleep(1000)
		Response.Write " *"
	Next
	response.write "</p>"
End Sub %>

But to use this you need to make sure the first line of your ASP code contains:

<% Response.Buffer = false %>

Turning off buffering makes sure that the web browser sees everything as it's happening.  Otherwise your web browser waits until ALL the HTML on your page is downloaded before anything appears at all.  And if you're using shared borders in Microsoft Frontpage, turn them off.  Otherwise your page gets buffered regardless of the setting in ASP.


Entire contents Copyright (C) 1994-2015 Brad Berson and Bytebrothers Internet ServicesAnim Plug
Page updated February 12, 2009.  See Terms and Conditions of use!