|Document ID:2003101710425654 |
Regular expressions you can use to set up spam rules in Symantec Mail Security for Microsoft Exchange
|Situation:||You want to create content filtering rules by using regular expressions to set up spam rules for Symantec Mail Security 4.x or 5.x for Microsoft Exchange.|
|Solution:||Creating rules under the Filter Policy allows the administrator to filter on subject, sender, or message body for the Symantec Exchange product. You can use a set of UNIX-based regular expressions (characters and symbols) to create rules. When creating a rule, you can use UNIX-based Regular expressions to flag variations of a subject line or sender. UNIX-based Regular expressions only work for the Spam Rule. |
Some DOS-based regular expressions, as noted in the following table, will not work for the Spam Rule. Only the UNIX-based regular expressions will work. The independent Subject Line rule (not the spam rules created for the subject field) is capable of using the DOS-based regular expressions only.
- The Content Rule is not capable of using either DOS or UNIX-based regular expressions and only supports standard A-Z 0-9 characters.
- Before applying any rules to your environment regarding email blocking or content filtering, Symantec recommends that you test the potential effects by setting the rule to Log Only or by testing in a nonproduction environment.
The spam lists can handle robust regular expressions. The entries for the Spam List settings (subject list and sender list) use UNIX-style regular expressions. These differ from DOS-style wildcard matching. Using the following regular expressions, you can match:
- Any number of any characters (.*)
- Any single character (.?)
For example, the regular expression string .*spam.? matches spam, bigspam, spam7, and bigspam7.
The following table lists the symbol, name, platform, a brief description, and an example of the regular expression:
|^||Must start with.||UNIX |
|The line must start with what follows: ^||^free matches any string with "free" as the first word. For example: Free money today!|
|$||Must end with.||UNIX|
|The line must end with what precedes: $||today$ matches any string ending with the word "today." For example: Buy it today|
|.||Match all characters for the given position.||UNIX|
|Match any character where the period is located.||l..k matches "link", "look", "lank", "lark", "lo k", "l k", "l33k", etc., but not "latchbock".|
|?||0 or 1 instance of a character.||UNIX||Matches 0 or 1 instance of the character to the immediate left of the question mark.||lo?k matches "lok" or "lk", but not "lock", "look" or "loooook".|
|?||Match all characters at this position||DOS||Match all characters only at the current position.||lo?k matches "look", "lock", "lork", "lo1k", etc., but not "lok".|
|*||0 or more instances of a character.||UNIX||Matches 0 or more instances of the character to the immediate left of the asterisk.||a*k matches "k", "ak", "aaaak", "aaaaaaaaaaaaaaaak", etc., but not "ack" or "ik."|
Note: To enter a "catch-all" similar to the DOS equivalent of "*," you would need to enter ".* ", to match 0 or more instances of any character.
|*||Match all.||DOS||Match all characters for any length.||*k matches "k", "lock", "pack", "network", "overwork", etc.|
|+||1 or more instances of a character.||UNIX|
|Matches 1 or more instances of the character to the immediate left of the plus sign.||b+e matches "be", "bbe", "bbbbbbbbe", etc., but would not match "brie" "bee", or "e".|
|[ ]||Match only the characters listed within the brackets.||UNIX|
|Match only with the characters listed the brackets. All the symbols with in the brackets are implicitly escaped for you. Except for the "^". (Case Sensitive)||[bhmy]e matches "be", "he", "me", "ye", "mye", "bye", "hbmye", etc., but not "humble" or "e".|
|[^]||Match all characters except the ones listed within the brackets.||UNIX|
|Match all character except those listed within the brackets. All the symbols with in the brackets are implicitly escaped for you. Except for the "^". (Case Sensitive)||be[^s]t matches "belt", "beat", "bert", "beAt", "be4t", "beSt", but not "best".|
|( )||Override Precedence||UNIX|
|Used to override the precedence of the regular expression symbols||((\$.*!)|(!.*\$)) Would enable three separate statements to be taken into account before completing the equation applied to a string.|
(\$.*!) - ".*" Anything and everything between a "\$" dollar sign and an "!" exclamation point.
(!.*\$) - ".*" Anything and everything between an "!" exclamation point and a "\$" dollar sign.
(...|...) - Either section 1 "|" or section 2
So, "$Free money today!" and "!Unbelievable offer$" would be caught, but not "Important message!" or "Time to think about the $."
|Either the expression to the left of the pipe or to the right of the pipe must match to make the whole expression true.||this|that matches any string containing the words "this" or "that."|
|Placed in front of the symbol so the literal is used and not the expression meaning. (i.e. \$ means use the $ , not match everything preceding the $)||free\$ matches "free$" anywhere in the string, but does not require "free" to be the last word in the string.|
|\s ||space ||UNIX||Placed in the expression when a space is needed.||The text "RE\s\[ " in an expression will block RE [ with an escape sequence before "["|
Note: Regular expressions are the default when using the "Text value" field, but that there is an option to use Regular Expressions, DOS wildcards, or literal strings when configuring a Match List.
When using regular expressions in a spam rule, the precedence order of the regular expression symbols is from highest precedence to lowest. The following list is in order of precedence, from highest to lowest:
( ) Precedence override
[ ] List
^ Start with
$ End with
. Match position
? Zero or one instance
* Zero or more instances
+ One or more instances
Examples of combined regular expressions
The following are just a few examples of the power of regular expressions:
Note: Symantec strongly recommends testing in a nonproduction environment or setting the spam rule to log only to avoid blocking or deleting valid email.
- You have lately been flooded with email with similar words in varying positions within the subject, but you do not want to make the rule too broad and end up blocking valid email.
Example: Three separate email messages with these subject lines: "BadItem thinks you need to need this BadThing," "Wouldn't a BadItem Badthing benefit your life," and "Badthing from BadItem is waiting for you today."
To block all email with both "BadItem" and "BadThing" in the same subject line, implement a rule with regular expressions like:
Meaning that any string that has both the words "BadThing" and "BadItem" would be caught and blocked by SMS for Exchange.
- You have numerous email messages that have a static word with a variable suffix that always comes at the start of a subject. For example: free, freed, freer, freeing, freeforyouall, freeforall.
Examples of subject lines: "Free calling cards," "Freeforall on unused domain names," or "Freeing up time for you to get more spam."
To block all email with a subject line that begins with some variant of the word "free," implement a regular expression with this context:
This would block any email beginning with the word "free" or beginning with a word containing the base word "free."
WARNING: The above expression, also blocks email messages that have subjects beginning with words like: freedom, freeze, freelance, and freehand.
- You are certain that you want to block a specific word, but do not want to block words that contain a valid variant of the base word.
Example subject lines: You would like to block the word "rat," but you do not want to block words like crater, rate, fraternity, or operation.
To block only the word "rat" and avoid blocking email with valid words, implement a regular expression with this context:
(^rat .*)|(.* rat .*)|(.* rat$) or (^rat.*)|(.*rat.*)|(.*rat$)
Note: In the above expressions there is an additional space between the brackets. For example, the expression (^rat[ ] .*) is left parenthesis, caret, rat, left square bracket, space, right square bracket, period, asterisk, and right parenthesis.
- You are receiving messages from senders that include either a single domain or subdomains of a specific source domain. Examples of senders: email@example.com, firstname.lastname@example.org, or email@example.com.
To block all messages from a given domain, including any subdomains, implement a regular expression with this context:
.*baddomain\.bad For single level domains
.*baddomain\.subdomain\.bad For multilevel domains
This blocks all senders from not only the source domain, but also any sub domains of the base domain.
Spam list issues and limitations
The spam list implementation has several issues and limitations:
- DBCS characters are not supported in the sender and subject spam lists.
- Lists do not allow an expression to begin with * or ? because the spam filter resolves this to .* or .?.
- There are certain combinations of regular expressions that are invalid. Unsupported combinations will disappear from the list after saving and reloading the page.
- Care must be taken not to create rules that match everything and everyone as spam. Symantec suggests changing spam rules with caution and always testing in a nonproduction environment first. Symantec generally recommends that you avoid the following types of policy settings:
- A size rule set so low that it triggers on every attachment and message body
- A subject line rule that is too broad or triggers on notifications
- An attachment name rule that is too broad or triggers on replacement text file names
- Items added to a spam list that are too broad or trigger on notifications
- A content filter threshold that is set too low, or words added to a content dictionary that occur too frequently
- Domain blocking does not work in all cases due to VSAPI limitations. For more information, see the document Known General Scan and VSAPI issues in Symantec AntiVirus/Filtering 3.0 for Microsoft Exchange.
|Document ID:2003101710425654 |