Google Analytics - Content grouping - Regex fix

Question

This is our URL structure:

http://www.disabledgo.com/access-guide/the-university-of-manchester/176-waterloo-place-2

http://www.disabledgo.com/access-guide/kingston-university/coombehurst-court-2

http://www.disabledgo.com/access-guide/kings-college-london/franklin-wilkins-building-2

http://www.disabledgo.com/access-guide/redbridge-college/brook-centre-learning-resource-centre

I am trying to create a list of groups based on the client names

/access-guide/[this bit]/...

So I can have a performance list of all our clients.

This is my regex:

/access-guide/(.*universit(y|ies)|.*colleg(e|es))/

I want it to group anything that has university/ies or college/es in it, at any point within that client name section of the URL.

At the moment, my current regex will only return groups that are X-University:

Durham-University
Plymouth-University
Cardiff-University 
etc.

What does the regex need to be to have the list I'm looking for?

Do I need to have something at the end to stop it matching things after the client name? E.g. ([^/]+$)?

Thanks for your help in advance!

Jonathan Mee · Accepted Answer

Depending upon your needs you may want to do:

/access-guide/([^/]*(?:university|universities|college|colleges)[^/]*)/

This will match names even if "university" or "college" is not at the end of the string. For example "college-of-the-ozarks" Note the non-capturing internal parenthesis, that should probably be used no matter what solution you go with, as you don't want to just match the word "university" or "college"

Live Example

Additionally, I don't know what may be in your but if you may have compound words you want to eliminate using a \b may be advisable. For instance if you don't want to match "miskatonic-postcollege" you may want to do something like this:

/access-guide/([^/]*\b(?:university|universities|college|colleges)\b[^/]*)/

Google Analytics - Content grouping - Regex fix

Answers (2)

Related Questions