Anit-Spam Guestbook Script
ALL Versions
COPYRIGHT @ 2005 by Aubrey Millard
How to set up the word filter
The word filter is just a text file that the script reads and compares the words in that file
to the entries submitted. The word filter in the script will look at ALL the fields, since
some spammers use thier product in the name field. Also you will be able to catch
a spammer based on the URL, even a partial one.
Each word or phrase that you put in the file must be on its own line. don't leave andy blank
lines. It won't hurt the script but it's just more work it has to do. You need to be careful when
adding words to the filter list. We want to prevent spammers from posting but we don't want
someone who is genuine to get caught by it.
Here are some examples:
In the word filter: xenical
The filter will catch: **xenical** buy xenical online xenicalxenicalxenical xenical34@hotmail.com
www.xenical-online.com
So as you can see the filter will find the word no matter where they put it. It is powerful but
that is also the reason you must be careful. For example...
In the worl filter: sex
The filter will catch: oralsex hardcore-sex sexsexsexsex sexy essex
The first three are offensive but as you can see the last two were not. I have no problem being called
sexy and I have nothing against people from Essex England. So how do we get around this? By using
a \b which tells the filter where to start and end when looking at the words.
So in the above example let's say people from Essex are ok but I dont want to be called sexy.
In the word filter: \bsex
The filter will catch: sexsexsexsex sexy sexual
So as you can see by putting a \b in front of the word, it will only find sex if it is at the beginning.
NOTICE there are no spaces between the \b and the word it has to be \bword\b and NOT \b word \b
If I really JUST hated the word "sex"....
In the worl filter: \bsex\b
The filter will catch: "Go here for sex movies" "It's sex o'clock already"
So you can see that the second one was probably just an honest typo which is why I don't recomment that
you put words like sex in the filter. But you can see how the \b can help to between potentially spam words
like "anal" and innocent ones like "canal"
Another trick is to use .? as in online.?casino If you do this then the filter will catch any variation like:
online-casino online_casino online*casino etc. so you can catch multiple words with just one entry!
Sometimes even .? is not enough. In the above example the filter would not catch online**casino because of the
** (remember .? is any single character) So here is another trick \W* as in online\W*casino. Basically this tells the
script to look for any non-word characters. So it will catch online_casino,online!!casino and online casino (3 spaces)
The W in \W*
must be a capital W not a small w or it won't work.
Every once in a while a spammer will get through because he has written nothing offensive. I have had entries where
they say "great website it really helped me with my homework" but the link was to a serch engine. The solution was to
put the partial website (url) address in the filter file. The submitted website was www.in-search-we-trust.com so in the
filter I put in-search-we-trust now if they try to post again, no matter how innocent the post is the filter will
catch that website and prevent them from posting.
In the main filter file you will notice also .info .biz and .ru I have found most spammers use these types of domains the most.
you can add more but don't use .com :-)
Spammers are creative and we must be just as creative to keep them off our websites.
Content problems
Lets say for example you are a medical practitioner of some sort and people will be talking about drugs and such
even on your guestbook. Well you now can't just filter the drug name like: xenical but I doubt ayone would use the
phrase "buy xenical" or "cheap xenical" or "discount xenical" so you could have those in the filter instead. There
is almost always something unique in a spam post that you can use in the filter. After all they are going to leave
a link to their product or service. So it would be easy just to add the partial link into the filter. Like: xenical-online.com
I had one spammer that was so inoccent he didn't leave a link or use any words that I could safely put in the
filter. But he did post the phone number, so I used that and he hasn't been able to post since.
The filter list that comes with this script is a really good start. You may want to delete some and add others.
If you look at the list you will find common swear words, drugs (the latest spam craze) , junk products like
online casinos/phone cards etc, plus there are a few partial internet address there as well. Try to be selective
in what you put in the filter. don't put in 5 words/phrases to try to catch one word. The bigger the filter list
is the longer the script takes to run.
What the filter won't catch
The filter won't catch idiots and people who just want to mess up your guestbook. Like the guy that leaves an
entry like "sdfagfjrgpwmfvprkvapvfqpevfnervnelrvnervnerlvnerlvnwerlvnwervnwernvewrn". But never fear, the script
automatically checks for these kinds of junk posts and will catch them.
This is also where BANS com in. When someone leaves and post it records their IP address, so
they might get through once but Ban them and they wont be able to post again from the same IP.
More about BANS in the Script documentation.
Setting up a filter for URL Scans
If you are using the URL scanning feature you may want to use a separate filter file.
If you have for example the word "insurance" in your main filter there is a good chance that
an innocent website will have that word on the page somewhere. On your guestbook though it
is probably not likely (unless you run an insurance company) that someone will use that word
unless they are a spammer.
The best way around this problem is to use the separate file for URL scans "scanfile.dat". Use only the worst
and most obvious words, swear words/porn related etc. There is an example file included in the zip
file. You can edit it by hand or create your own.
To create your own just create a text file and name it whatever you want. Put it in the same folder as
the script. just add the words you want to the file. one word/entry per line.
Make sure that after you are done to change the variable SCANFILE to the name of the text file you have
created.