Spamassassin - Learning Options

auto_whitelist_factor n (default: 0.5, range [0..1])
How much towards the long-term mean for the sender to regress a message. Basically, the algorithm is to track the long-term mean score of messages for the sender (mean), and then once we have otherwise fully calculated the score for this message (score), we calculate the final score for the message as:

finalscore = score + (mean - score) * factor

So if factor = 0.5, then we'll move to half way between the calculated score and the mean. If factor = 0.3, then we'll move about 1/3 of the way from the score toward the mean. factor = 1 means just use the long-term mean; factor = 0 mean just use the calculated score.

bayes_auto_learn ( 0 | 1 ) (default: 1)
Whether SpamAssassin should automatically feed high-scoring mails (or low-scoring mails, for non-spam) into its learning systems. The only learning system supported currently is a naive-Bayesian-style classifier.

Note that certain tests are ignored when determining whether a message should be trained upon: - auto-whitelist (AWL) - rules with tflags set to 'learn' (the Bayesian rules) - rules with tflags set to 'userconf' (user white/black-listing rules, etc)

Also note that auto-training occurs using scores from either scoreset 0 or 1, depending on what scoreset is used during message check. It is likely that the message check and auto-train scores will be different.

bayes_auto_learn_threshold_nonspam n.nn (default: 0.1)
The score threshold below which a mail has to score, to be fed into SpamAssassin's learning systems automatically as a non-spam message.
bayes_auto_learn_threshold_spam n.nn (default: 12.0)
The score threshold above which a mail has to score, to be fed into SpamAssassin's learning systems automatically as a spam message.

Note: SpamAssassin requires at least 3 points from the header, and 3 points from the body to auto-learn as spam. Therefore, the minimum working value for this option is 6.

bayes_ignore_header header_name
If you receive mail filtered by upstream mail systems, like a spam-filtering ISP or mailing list, and that service adds new headers (as most of them do), these headers may provide inappropriate cues to the Bayesian classifier, allowing it to take a ``short cut''. To avoid this, list the headers using this setting. Example:
        bayes_ignore_header X-Upstream-Spamfilter
bayes_ignore_header X-Upstream-SomethingElse
bayes_min_ham_num (Default: 200)
bayes_min_spam_num (Default: 200)
To be accurate, the Bayes system does not activate until a certain number of ham (non-spam) and spam have been learned. The default is 200 of each ham and spam, but you can tune these up or down with these two settings.
bayes_learn_during_report (Default: 1)
The Bayes system will, by default, learn any reported messages (spamassassin -r) as spam. If you do not want this to happen, set this option to 0.