Reputation: 303
In my fight against spam mails I want to check if a email contains a link to a website inside the body of an email.
If the mail does indeed contain a link, then I want to make a DNS lookup against SpamHaus ... or whatever similar DNS service to check if the link is pointing to a server that has an A record on the form 127.x.x.x
.
See examples of ip addresses in use at SpamHaus here
My first attempt at detecting a link would be to create a regular expression that looks for if the body of a mail contains anything beginning with http://
or https://
and ends with a /
or ?
.
I ended up with the following regex by using RegEx101.com:
https?:\/\/((\w|\d)+\.)+\w+
If I got this right it should mean it matched any domain containing alphanumeric characters or alternatively is just an IP address.
The regex matches any string that has the following syntax.
http
followed by an optional s
.://
.Then it matches:
It matches anything containing any characters [a-z] or the digits [0-9] followed by a period and this pattern has to be matched at least once. This is the domain name or subdomain name in the link.
Then followed by matching the characters [a-z] at least once. This is the TLD part of a domain name.
Finally:
It matches if the link ends with a /
or ?
.
I am well aware a link does not need to have a subpath.
This is an issue for another day. :-)
Similar: I will also leave out support for links to ipv4 and ipv6 addresses. :-)
I do not care for anything that comes after the /
or ?
because it has no influence for the following DNS lookup.
This lead me to add the following lines to local.cf
:
body BADLINKS /https?:\/\/((\w|\d)+\.)+\w+/i
describe BADLINKS Email contains weblinks
score BADLINKS 0.1
After I restarted SpamAssassin I tried testing if SpamAssassin was matching the rule by running the command: cat linktest | spamc -R
.
The file linktest
contains a confirmed spam mail with 3 links inside the body.
However I cannot see my test was triggered?
This was the output from running the command:
pts rule name description
---- ---------------------- --------------------------------------------------
1.1 URIBL_GREY Contains an URL listed in the URIBL greylist
[URIs: sendgrid.net]
0.0 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level
mail domains are different
0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record
0.0 HTML_MESSAGE BODY: HTML included in message
0.1 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
-0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
-0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from
envelope-from domain
0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily
valid
0.0 T_KAM_HTML_FONT_INVALID Test for Invalidly Named or Formatted
Colors in HTML
0.8 FROM_FMBLA_NEWDOM28 From domain was registered in last 14-28
days
-0.0 DKIMWL_WL_MED DKIMwl.org - Medium trust sender
The mail that I am using contains amongst other this text:
<img src=3D"https://u11846957.ct.sendgrid.net/wf/open?...">
The link inside the double quotes should be matched except the ending /wf/open?...
, so what am I missing?
Upvotes: 1
Views: 68
Reputation: 303
Doh!
Word of advice: Know the difference between how body
works vs rawbody
in SpamAssassin rules.
The rule was not matched when I used body
in the local.cf.
What I needed to use was rawbody
like so:
rawbody BAD_LINKS /https?:\/\/((\w|\d)+\.)+\w+/i
describe BAD_LINKS Email contains weblinks
score BAD_LINKS 0.1
After restarting SpamAssassin I got the following result:
pts rule name description
---- ---------------------- --------------------------------------------------
-0.0 RCVD_IN_MSPIKE_H2 RBL: Average reputation (+2)
[149.72.232.26 listed in wl.mailspike.net]
0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record
0.0 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level
mail domains are different
0.1 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
0.0 HTML_MESSAGE BODY: HTML included in message
0.1 BAD_LINKS RAW: Email contains weblinks <<<--- My rule
-0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from
envelope-from domain
0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily
valid
-0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
1.1 URIBL_GREY Contains an URL listed in the URIBL greylist
[URIs: sendgrid.net]
0.0 T_KAM_HTML_FONT_INVALID Test for Invalidly Named or Formatted
Colors in HTML
0.8 FROM_FMBLA_NEWDOM28 From domain was registered in last 14-28
days
-0.0 DKIMWL_WL_MED DKIMwl.org - Medium trust sender
Upvotes: 0