lost baby
lost baby

Reputation: 3268

how to modify html links in a newsletter in php before sending them out?

I am writing a newsletter app and I need to make the user defined links that may appear in the clients newsletters change like so that a detected link like

<a href="http://whateverclientsite.com/">blah</a>

becomes

<a href="http://mysite.com/redirect.php?utm_source=Emails&utm_medium=MyNewsletterSubject&utm_campaign=MyNewsletterCampaign&eid=123123&mailid=234234&url=http://whateverclientsite.com/>blah</a>

My redirect.php will be a page with a some google analytics code that will fire (so I can track how many link clicks the newsletters generate) and then redirect itself to the users defined url http://whateverclientsite.com.

I must do this link rewriting in php not in a client side javascript as the change must be done before the newsletter is sent out.

What I am looking for here is the code to do the url rewriting, the google stuff I already have working. Should be a fairly simple regex operation but my regex skills suck.

I will post back if I get it working before any answers come in.

PS: also I need to weed out certain urls and image tags so they do not get rewritten. For instance any link to mysite.com should not be rewritten.

PS The whole newsletter exists as a php string by the time I have to process it. So
$newsletter = rewriteurls($newsletter, $url_exceptions_array);
is the function call I am thinking of - so my question is, how should I define rewriteurls()?

Upvotes: 0

Views: 201

Answers (1)

madfriend
madfriend

Reputation: 2430

URL forming rules are quite complex (rfc). And html attributes are complex too. But if you don't mind making a tradeoff on recall, here you are:

$new_url = preg_replace(
    '@href=(?:\'|")?(?P<url>[\w?&=+/%#.:-]*)(?:\'|")?@i', 
    'href="redirect.php?u=$1"', # replace this with desired wrapper
    $your_feed);

This function fails at: wrong schemes (like abbbc://this.is.invalid.url), wrong tags (<link href=...), spaces (href =)but it's not very likely that you will encounter these cases. If you do, improve this regular expression to cover missing stuff.

Let's take a look at what this regex consists of (the following is not usable as it is presented).

@ <-- delimiter
  href=(?:\'|")? <-- href=' or href=" or href=
  (?P<url> <-- capturing part
    [\w?&=+/%#.:-]* <-- a-zA-Z0-9_?/=+%#.:-& from zero to infinity times
  )
  (?:\'|")? <-- close href value
@i <-- delimiter, case insensetive modifier. HREF will work too

Upvotes: 1

Related Questions