aden
aden

Reputation: 29

Google Analytics - Content grouping - Regex fix

This is our URL structure:

http://www.disabledgo.com/access-guide/the-university-of-manchester/176-waterloo-place-2

http://www.disabledgo.com/access-guide/kingston-university/coombehurst-court-2

http://www.disabledgo.com/access-guide/kings-college-london/franklin-wilkins-building-2

http://www.disabledgo.com/access-guide/redbridge-college/brook-centre-learning-resource-centre

I am trying to create a list of groups based on the client names

/access-guide/[this bit]/...

So I can have a performance list of all our clients.

This is my regex:

/access-guide/(.*universit(y|ies)|.*colleg(e|es))/

I want it to group anything that has university/ies or college/es in it, at any point within that client name section of the URL.

At the moment, my current regex will only return groups that are X-University:

Durham-University
Plymouth-University
Cardiff-University 
etc.

What does the regex need to be to have the list I'm looking for?

Do I need to have something at the end to stop it matching things after the client name? E.g. ([^/]+$)?

Thanks for your help in advance!

Upvotes: 1

Views: 583

Answers (2)

Jonathan Mee
Jonathan Mee

Reputation: 38919

Depending upon your needs you may want to do:

/access-guide/([^/]*(?:university|universities|college|colleges)[^/]*)/

This will match names even if "university" or "college" is not at the end of the string. For example "college-of-the-ozarks" Note the non-capturing internal parenthesis, that should probably be used no matter what solution you go with, as you don't want to just match the word "university" or "college"

Live Example

Additionally, I don't know what may be in your but if you may have compound words you want to eliminate using a \b may be advisable. For instance if you don't want to match "miskatonic-postcollege" you may want to do something like this:

/access-guide/([^/]*\b(?:university|universities|college|colleges)\b[^/]*)/

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626893

If the client name section of the URL is after the access-guid/ and before the next /:

http://www.disabledgo.com/access-guide/the-university-of-manchester/176-waterloo-place-2
                                      |----------------------------|

you need to use a negated character class to only match university before the regex reaches that rightmost / boundary.

As per the Reference:

You can extract pages by Page URL, Page Title, or Screen Name. Identify each one with a regex capture group (Analytics uses the first capture group for each expression)

Thus, you can use

/access-guide/([^/]*(universit(y|ies)|colleges?))
              ^^^^^

See demo.

The regex matches

  • /access-guide/ - leftmost boundary, matches /access-guide/ literally
  • [^/]* - any character other than / (so we still remain in that customer section)
  • (universit(y|ies)|colleges?) - university, or universities, orcollegeorcolleges` literally. Add more if needed.

Upvotes: 0

Related Questions