![]() | ![]() |
As you go through your mail you may find mail in your spool file that is actually spam so it should be transferred into the spam mail box. If you find mail in your spamfound mailbox that is really ham then it should be transferred to your ham mail box. In the unsure mailbox you might find both ham and spam mixed. These should then be transferred to the appropriate mailboxes. Now initially you might find that SpamBayes will make a large number of mistakes but with time and training the mistakes should shrink to the point that you would expect your spoolfile would be nearly or maybe totally spam free. Training should be done with about equal amounts of spam and ham. So one might purposely transfer some messages from your spoolfile (your good mail file) to the ham folder to equalize out the number of spam and ham messages you are using in training. More about this later. The data gathered by SpamBayes to train your mail is in a database file called hammie.db in your home directory.
The filtering program is called sb_filter.py and you will run it normally through a .procmailrc script that you place in you home directory.
Procmail
The following assumes that you are reading mail in the CS.Trinity.Edu domain. This mail is processed by Mail.CS.Trinity.Edu (aka Sol). The mail transport agent sendmail uses procmail to deliver your mail to your inbox and mail directory files. You will probably find sb_filter.py the easiest application to integrate into your mail environment. Now the .procmailrc file that you place in your home directory will be actually run by the server machine, but since the client machines and the server share your home directory you can do all the procmail configuration on a client as well as train hammie.db and these files will be used for filtering on the server. The filtering program used in the SpamBayes system, called sb_filter.py, is found in the directory /usr/bin which should be in your default path.
You should perform the following steps to set up your SpamBayes system for mail filtering.
1. Create the database that SpamBayes will use to test your incoming mail:
sb_filter.py -d $HOME/.hammie.db -n
2. Train it on your existing mail using the following command ( -g is the flag for the known good mail, and -s is for known spam).
/usr/local/bin/sb_mboxtrain.py -d $HOME/.hammie.db -g $HOME/Mail/ham -s $HOME/Mail/spam
3. Adding the following recipes to the top of your .procmailrc will get the spam and unsure stuff out of the way. Allowing everything else to be filtered as per your normal procmail recipes. The last clause in the script is what causes the ham to be put in the spoolfile inbox. If you want to use POP3 to receive your ham mail then leave out this last clause.
PATH=$HOME/bin:/usr/bin:/usr/ucb:/bin:/usr/local/bin:SHELL=/bin/sh
MAILDIR = $HOME/Mail # You'd better make sure it exists
LOGFILE = $MAILDIR/procmail.log
LOCKFILE= $HOME/.lockmail
:0fw:hamlock
| /usr/local/bin/sb_filter.py -d $HOME/.hammie.db
0:
* ^X-Spambayes-Classification: spam
${MAILDIR}/spam
:0
* ^X-Spambayes-Classification: unsure
${MAILDIR}/unsure
:0
* ^X-Spabayes-Classification: ham
${MAILDIR}/inbox
5. You train the database using the following command that assumes that you have collected
ham in a file called ham and spam in a file called spam.
/usr/local/bin/sb_mboxtrain.py -d $HOME/.hammie.db -g $HOME/Mail/ham -s $HOME/Mail/spamYou might automate the process by constructing a shell script called trinsb .
#!/bin/bash #script: trainsb /usr/local/bin/sb_mboxtrain.py -d $HOME/.hammie.db -g $HOME/Mail/$1 -s $HOME/Mail/$2Then trainsb would be used as follows:
trainsb ham spamNow it turns out to be usefull to train your mail using both the ham and spam that SpamBayes has already identified.
SpamBayes configuration file
Your SpamBayes configuration file is called .spambayesrc and is located in your home directory.
Start it out with the following three lines and add lines directly or through the web interface as needed or required.
[Storage] persistent_use_database=dbm persistent_storage_file=~/.hammie.dbIf you wish to get a list of all the configuration options or just some of them you can use the following commands:
(note each of these commands should be placed on a single line)
python -c "from spambayes.Options import options ; print options.display_full()"The command above will print out a complete list of the options, including a description of the option, and their default values. You can also look up options for a single section, if you know its name:
python -c "from spambayes.Options import options ; print options.display_full('section_name')"
Or just a single option:
python -c "from spambayes.Options import options ;
print options.display_full('section_name', 'option_name')"
If you want a list of all the sections, you can use this command:
python -c "from spambayes.Options import options ; printoptions.sections()"POP3
If you do your training as indicated above from a client machine then strictly speaking your use of POP3 or IMAP is not required. However, SpamBayes has an implementation of a POP3 proxy server that you can use to train you mail through web interface. To start it you type:
sb_server.py -b &and it will respond:
SpamBayes POP3 Proxy Version 1.0rc2 (June 2004) and engine SpamBayes Engine Version 0.3 (January 2004). Loading database... User interface url is http://localhost:8880/Open a browser and go to http://localhost:8880/ and you will see a web page that allows to you configure the SpamBayes POP3 proxy server. You can click on Configuration in the upper right and fill in the mailserver's address: mail.cs.trinity.edu and your username and password and tell the server to listen on port 8110. Another option deals with cutoffs for spam and ham. sb_filter.py assigns to each mail message a level of spamness. If this level is .2 or less it is identified as ham. If it is .9 or greater it is identified as spam. Between .2 and .9 the mail is identified as unsure. For now, leave the default cutoff values of .9 and .2 as well as the other parameters. These can be chaged later when you experience how SpamBayes reacts to your mail stream. Save the configuration and then return to the home page. There you will see you can identify mail files to be used in training as well as enter individual messages to be used in training.
To exit from sb_server.py just click Save and Exit on the bottom right corner of the main web page.
Further information on SpamBayes can be optaines at their web site: http://www.spambayes.org .
![]() |
Site Index
|
CS WebMail CS ListServer |