EXTERNAL INOCULATION THEORY Bill Yerazunis recently expressed his theory of inoculation on an anti-spam development list, using the term "vaccination": "Part of the problem is that spam isn't stationary, it evolves. That pesky .1% error rate is in some part due to the base mutation rate of spam itself. Maybe the answer is "vaccination". Vaccination is using _one_ person's misery be used to generate some protective agent that protects the rest of the population; only the first person to get the spam actually has to read it. My expectation is this: say you have ten friends, and you all agree to share your training errors. Each of you will (statistically) expect to be the first to see a new mutation of spam about 9% of the time; the other ten friends in this group will have their bayesian filter trained preemptively to prevent this. Net result: you get a tenfold decrease in error rate - down to 99.99% accuracy. With a hundred such (trusted) friends, you may be down to 99.999% accuracy." DSPAM has taken this concept and rolled it into support for what we call "inoculation groups" providing the exact functionality Bill describes. This could be considered an "internal inoculation" practice. On top of this, DSPAM has been designed to support external inoculation as a complement to internal inoculation. This is where instead of your internal circle of friends inoculate you, you rely on external elements - namely spammers themselves - to inoculate you. The theory behind external inoculation is this: why put _anyone_ through the misery of being the first to receive a new spam when you can have the spammers themselves send it directly to you. On top of this, external inoculation can be combined with internal inoculation by taking the spam you received externally and inoculating your friends with it internally. Inoculation is a little different from learning, as inoculation causes tokens to be given additional hit counts in an attempt to learn from a single email. As a result, any form of inoculation should _only_ be attempted after an initial learning phase (perhaps when your filtering accuracy exceeds 99.0%). DSPAM inoculates like this: 1. Every token that doesn't already exist in the database, or have fewer than two hits will be hit five times. 2. All other tokens are hit twice. External inoculation is accomplished by creating a covert, external alias that is configured to automatically inoculate your dictionary from any messages it receives. The covert alias can then be published onto a series of public newsgroups and websites where it is sure to be harvested by a spammer's tools. One could even pro-actively subscribe one's self to several different opt-in spam lists, etcetera. The first step is to configure an alias. To do this you would use something like: bob_c: "|/path/to/dspam --addspam --inoculate --user bob --corpus" The 'C' in bob is for 'Covert'. We must use a covert alias because if we use something obvious like 'bob-spam', harvester tools will automatically strip the -spam off and spam your real account. Once the alias is set up, make sure this alias gets out only on lists where harvesters will grab it, and nobody will send legitimate email to it. It may even be a good idea to put it at the bottom of your tagline in all your publicly archived emails, something like... Spammers, send me mail here: bob_c@yourdomain.com Finally, you can multiply the effects of this by sharing an inoculation group with your friends. If all of your friends have a public covert alias, then you will all be able to inoculate eachother should one of you receive a spam to the account. What a great way to train your filter! On top of this, should external inoculation become commonplace to the point where harvesters are picking up an equal amount of them as legitimate email addresses, spammers will start to realize that harvesters are just plain too dumb to tell the difference (the spammers themselves couldn't tell if mine was or not). This could, best case scenario, put an end to harvester bots, making them obsolete as counter-productive tools.