Cannot detect a weblink in a email using SpamAssassin

In my fight against spam mails I want to check if a email contains a link to a website inside the body of an email.

If the mail does indeed contain a link, then I want to make a DNS lookup against SpamHaus ... or whatever similar DNS service to check if the link is pointing to a server that has an A record on the form 127.x.x.x.

See examples of ip addresses in use at SpamHaus here

My first attempt at detecting a link would be to create a regular expression that looks for if the body of a mail contains anything beginning with http:// or https:// and ends with a / or ?.

I ended up with the following regex by using RegEx101.com:

https?:\/\/((\w|\d)+\.)+\w+

If I got this right it should mean it matched any domain containing alphanumeric characters or alternatively is just an IP address.

The regex matches any string that has the following syntax.

Then it matches:

Finally:

It matches if the link ends with a / or ?.


I am well aware a link does not need to have a subpath.

This is an issue for another day. :-)

Similar: I will also leave out support for links to ipv4 and ipv6 addresses. :-)

I do not care for anything that comes after the / or ? because it has no influence for the following DNS lookup.

This lead me to add the following lines to local.cf:

body       BADLINKS     /https?:\/\/((\w|\d)+\.)+\w+/i
describe   BADLINKS     Email contains weblinks
score      BADLINKS     0.1

After I restarted SpamAssassin I tried testing if SpamAssassin was matching the rule by running the command: cat linktest | spamc -R.

The file linktest contains a confirmed spam mail with 3 links inside the body.

However I cannot see my test was triggered?

This was the output from running the command:

 pts rule name              description
---- ---------------------- --------------------------------------------------
 1.1 URIBL_GREY             Contains an URL listed in the URIBL greylist
                            [URIs: sendgrid.net]
 0.0 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level
                            mail domains are different
 0.0 SPF_HELO_NONE          SPF: HELO does not publish an SPF Record
 0.0 HTML_MESSAGE           BODY: HTML included in message
 0.1 MIME_HTML_ONLY         BODY: Message only has text/html MIME parts
-0.1 DKIM_VALID             Message has at least one valid DKIM or DK signature
-0.1 DKIM_VALID_EF          Message has a valid DKIM or DK signature from
                            envelope-from domain
 0.1 DKIM_SIGNED            Message has a DKIM or DK signature, not necessarily
                            valid
 0.0 T_KAM_HTML_FONT_INVALID Test for Invalidly Named or Formatted
                            Colors in HTML
 0.8 FROM_FMBLA_NEWDOM28    From domain was registered in last 14-28
                            days
-0.0 DKIMWL_WL_MED          DKIMwl.org - Medium trust sender

The mail that I am using contains amongst other this text:

 <img src=3D"https://u11846957.ct.sendgrid.net/wf/open?...">

The link inside the double quotes should be matched except the ending /wf/open?..., so what am I missing?

Upvotes: 1

Views: 68

Answers (1)

Doh!

Word of advice: Know the difference between how body works vs rawbody in SpamAssassin rules.

The rule was not matched when I used body in the local.cf.

What I needed to use was rawbody like so:

rawbody    BAD_LINKS     /https?:\/\/((\w|\d)+\.)+\w+/i
describe   BAD_LINKS     Email contains weblinks
score      BAD_LINKS     0.1

After restarting SpamAssassin I got the following result:

 pts rule name              description
---- ---------------------- --------------------------------------------------
-0.0 RCVD_IN_MSPIKE_H2      RBL: Average reputation (+2)
                            [149.72.232.26 listed in wl.mailspike.net]
 0.0 SPF_HELO_NONE          SPF: HELO does not publish an SPF Record
 0.0 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level
                            mail domains are different
 0.1 MIME_HTML_ONLY         BODY: Message only has text/html MIME parts
 0.0 HTML_MESSAGE           BODY: HTML included in message

 0.1 BAD_LINKS              RAW: Email contains weblinks <<<--- My rule

-0.1 DKIM_VALID_EF          Message has a valid DKIM or DK signature from
                            envelope-from domain
 0.1 DKIM_SIGNED            Message has a DKIM or DK signature, not necessarily
                            valid
-0.1 DKIM_VALID             Message has at least one valid DKIM or DK signature
 1.1 URIBL_GREY             Contains an URL listed in the URIBL greylist
                            [URIs: sendgrid.net]
 0.0 T_KAM_HTML_FONT_INVALID Test for Invalidly Named or Formatted
                            Colors in HTML
 0.8 FROM_FMBLA_NEWDOM28    From domain was registered in last 14-28
                            days
-0.0 DKIMWL_WL_MED          DKIMwl.org - Medium trust sender

Upvotes: 0

Related Questions