SPAM filtering using Procmail

Last modified 2003 DEC 03 21:27:37 GMT
Procmail isn't just for filtering spam, but for anyone who uses procmail, that's something it's obviously good for. It is far more capable than anything your email client might offer for mail filtering.

I have a LOT of spam filters. I've posted excerpts of them from time to time on the procmail discussion list, but simply haven't had the time to collect them all up into a library and publish them as a cohesive list. Perhaps someday. In the meantime, a select few rules are linked from here.

Note that I use a system which I call "SPAMMISHNESS", wherein spam filters do not flag a message with a simple true/false (or take immediate action), but rather, they add a value to a cumulative score checked after ALL THE SPAM CHECKS HAVE BEEN PERFORMED. Yes, this means that my own spam filtering doesn't take a shortcut out when a definite spam characteristic is encountered, but it does mean that I see the stats on ALL of the filters, and it definatley shows in the results - despite receiving circa 600-700 messages to my own mailbox each day, less than a dozen spams PER MONTH reach my inbox.

One benefit of the SPAMMISHNESS method is that a lot of characteristics which are not 100% positive identifiers of spam - but which are most often encountered on spam - can be used to contribute to the final evaluation. Consider it something like circumstantial evidence - individually, they're insufficient to gain a conviction, but when viewed on the whole, a pattern develops.

After tiring of receiving messages in character sets which I could not read, I wrote a reasonably extensive filter to flag messages which are identified to be in foreign (to me) character sets. In a nod to Amerikuns and white trash crackers everwhere, I named the filter "furrin.rc".

spewhosts is a ruleset for checking headers on messages claiming to be from ISP hosts which spam is commonly forged to be from (such at hotmail, or msn), and checking to see if there are the expected mail hosts (empirically determined) in the headers - if they're not present, then the message is suspect as a forgery (though there are times when someone might validly be setting their From: to a freemail service and sending the message through their own server - but then, that's why I use SPAMMISHNESS.


