Reputation: 29
This is our URL structure:
http://www.disabledgo.com/access-guide/the-university-of-manchester/176-waterloo-place-2
http://www.disabledgo.com/access-guide/kingston-university/coombehurst-court-2
http://www.disabledgo.com/access-guide/kings-college-london/franklin-wilkins-building-2
http://www.disabledgo.com/access-guide/redbridge-college/brook-centre-learning-resource-centre
I am trying to create a list of groups based on the client names
/access-guide/[this bit]/...
So I can have a performance list of all our clients.
This is my regex:
/access-guide/(.*universit(y|ies)|.*colleg(e|es))/
I want it to group anything that has university/ies or college/es in it, at any point within that client name section of the URL.
At the moment, my current regex will only return groups that are X-University:
Durham-University
Plymouth-University
Cardiff-University
etc.
What does the regex need to be to have the list I'm looking for?
Do I need to have something at the end to stop it matching things after the client name? E.g. ([^/]+$)
?
Thanks for your help in advance!
Upvotes: 1
Views: 583
Reputation: 38919
Depending upon your needs you may want to do:
/access-guide/([^/]*(?:university|universities|college|colleges)[^/]*)/
This will match names even if "university" or "college" is not at the end of the string. For example "college-of-the-ozarks" Note the non-capturing internal parenthesis, that should probably be used no matter what solution you go with, as you don't want to just match the word "university" or "college"
Additionally, I don't know what may be in your but if you may have compound words you want to eliminate using a \b
may be advisable. For instance if you don't want to match "miskatonic-postcollege" you may want to do something like this:
/access-guide/([^/]*\b(?:university|universities|college|colleges)\b[^/]*)/
Upvotes: 2
Reputation: 626893
If the client name section of the URL is after the access-guid/
and before the next /
:
http://www.disabledgo.com/access-guide/the-university-of-manchester/176-waterloo-place-2
|----------------------------|
you need to use a negated character class to only match university
before the regex reaches that rightmost /
boundary.
As per the Reference:
You can extract pages by Page URL, Page Title, or Screen Name. Identify each one with a regex capture group (Analytics uses the first capture group for each expression)
Thus, you can use
/access-guide/([^/]*(universit(y|ies)|colleges?))
^^^^^
See demo.
The regex matches
/access-guide/
- leftmost boundary, matches /access-guide/
literally[^/]*
- any character other than /
(so we still remain in that customer section)(universit(y|ies)|colleges?)
- university
, or universities, or
collegeor
colleges` literally. Add more if needed.Upvotes: 0