Spamassassin
Largely based on https://www.christianroessler.net/tech/2015/spamassassin-dovecot-postfix.html
Packages
apt install spamassassin spamc dovecot-antispam
Config
Enable virtual user support for the trained Bayes databases so that each account can have its own spam filter training data.
--- a/data/etc/default/spamassassin
+++ b/data/etc/default/spamassassin
@@ -17,7 +17,7 @@ ENABLED=0
# make sure --max-children is not set to anything higher than 5,
# unless you know what you're doing.
-OPTIONS="--create-prefs --max-children 5 --helper-home-dir"
+OPTIONS="--create-prefs --max-children 5 --helper-home-dir --virtual-config-dir=/var/vmail/%d/%l/spamassassin --allow-tell --timeout-child 30 --username vmail -x"
# Pid file
# Where should spamd write its PID to file? If you use the -u or
@@ -31,4 +31,4 @@ PIDFILE="/var/run/spamd.pid"
# Cronjob
# Set to anything but 0 to enable the cron job to automatically update
# spamassassin's rules on a nightly basis
-CRON=0
+CRON=1
--- a/etc/spamassassin/local.cf
+++ b/etc/spamassassin/local.cf
@@ -10,6 +10,7 @@
# Add *****SPAM***** to the Subject header of spam e-mails
#
# rewrite_header Subject *****SPAM*****
+add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ tests=_TESTS_ autolearn=_AUTOLEARN_ version=_VERSION_
# Save spam messages as a message/rfc822 MIME attachment instead of
@@ -31,25 +32,25 @@
# Set the threshold at which a message is considered spam (default: 5.0)
#
-# required_score 5.0
+required_score 3.0
# Use Bayesian classifier (default: 1)
#
-# use_bayes 1
+use_bayes 1
# Bayesian classifier auto-learning (default: 1)
#
-# bayes_auto_learn 1
+bayes_auto_learn 1
# Set headers which may provide inappropriate cues to the Bayesian
# classifier
#
-# bayes_ignore_header X-Bogosity
-# bayes_ignore_header X-Spam-Flag
-# bayes_ignore_header X-Spam-Status
+bayes_ignore_header X-Bogosity
+bayes_ignore_header X-Spam-Status
+bayes_ignore_header X-Spam-Flag
# Some shortcircuiting, if the plugin is enabled
Training the filter
In order to conveniently train the Bayes database, you can use dedicated folders for spam and ham and use a cronjob to periodically feed their contents to sa-learn.
A script that does that just could look like this:
#!/bin/bash
source /usr/local/include/lock_process.sh
lock "SA-LEARN"
MAIL_ROOT=/var/vmail
VMAIL_USER=vmail
SA_SPAM_FOLDER=.sa_spam
SA_HAM_FOLDER=.sa_ham
SPAM_FOLDER=.Spam
HAM_FOLDER=.
function learn() {
local sa_type=$1
local db_file=$2
local src_dir=$3
local dst_dir=$4
if [ -d $src_dir ]; then
[ -z "$(ls -A $src_dir)" ] && return
[ -d $dst_dir ] || return
for mail in $src_dir/*; do
sa-learn --$sa_type -u debian-spamd --dbpath $db_file --dir $mail
mv -v $mail $dst_dir
done
fi
}
for DOMAIN_DIR in $MAIL_ROOT/*; do
for USER_DIR in $DOMAIN_DIR/*; do
DB_PATH=$USER_DIR/spamassassin
DB_FILE=$DB_PATH/bayes
learn spam ${DB_FILE} $USER_DIR/mail/$SA_SPAM_FOLDER/cur $USER_DIR/mail/$SPAM_FOLDER/cur
learn ham ${DB_FILE} $USER_DIR/mail/$SA_HAM_FOLDER/cur $USER_DIR/mail/$HAM_FOLDER/cur
[ -d ${DB_PATH} ] && chown -R $VMAIL_USER ${DB_PATH}
done
done
unlock "SA-LEARN"
Things to note:
- In order to prevent concurrent runs this script uses the lock function shown in the article Locking mechanism in shell scripts
- The script assumes that the folders Spam, sa_spam and sa_ham already exist
- The script assumes the virtual user directory structure presented in Postfix
Now you can use your mail client to move mail to sa_spam or sa_ham to teach spamassassin to recognize them as spam and ham respectively.