Solving spam problems without using captcha

Some of you know that automated spam bots for the web can be pain in the ass.

And while there are many ways to deal with this problem, most of them complicate things for the regular user and are not hacky enough to satisfy me. 🙂 For example captchas, math problem solving, etc. mostly work fine but they also scare away some of the users. There are also some paid tools (Akismet?) that helps dealing with spam but it doesn’t work on all spam.

I will share information on how I stop almost all of the automated spam messages to my WordPress blog without complicating things for the user.

Knowing limitations of HTTP protocol libraries and “exploiting” them

I have been writing a lot of scripts that automate GET’ing and POST’ing things to the web. There are many libraries that help doing it and one of the most popular ones (if not the most popular) is Curl.

While Curl is great, it still lacks some features, like building a zero length multipart POST data file upload part. Basically it can’t simulate file upload field, that has no file selected for uploading. And since many spam bots use Curl and similar libraries, this can be used to identify real browser and a script.

Example usage in PHP:

if(!isset($_FILES['spamcheck']) or $_FILES['spamcheck']['name']!='' or $_FILES['spamcheck']['size']!=0 )

Not just Curl

I haven’t used Curl for a year now because I switched to Perl’s LWP. And it has the same problem. I solved it by making my own function for building the multipart form data and it can simulate the behavior but it doesn’t happen by default.

And while in case of LWP it was rather simple, doing it in Curl (used in PHP) would probably require recompiling the Curl library and then recompiling the module for the programming language that is using that library.

Changing field names

Another trick I use is changing field names and adding additional ones that are meant to be left empty.

This method also seems to catch part of spam messages.

From the user point of view

User obviously doesn’t see those extra fields because they get hidden using CSS. The comment form looks like any other standard comment form.

Conclusion

I have been using these techniques for about a year now and haven’t had any spam problems since then. Also I have a huge log file with spam that was caught using these methods. This really does work. 🙂

Of course it won’t help in case of directly targeted spam but in that case captchas won’t help either.

And I understand that writing about this will probably contribute to fixing Curl and other libraries and eventually making this protection method useless. Well, at least they will finally fix those libraries! 😀