jimbob
jimbob

Reputation: 33

Extracting data from URL path via regex

I am trying to extract data from a URL path as follows:

/12345678901234567890123456789012/1230345035/wibble/wobble/

With this regex i can extract into 3 groups with this regex:

\/([^\/]*)\/([^\/]*)(\/wibble\/wobble)

Which gives me:

group 1 = 12345678901234567890123456789012  
group 2 = /1230345035  
group 3 = /wibble/wobble  

However this isn't quite what i need - I am trying to get the data extracted in group 2 to also be in group 3, so like this:

group 1 = 12345678901234567890123456789012  
group 2 = /1230345035  
group 3 = /1230345035/wibble/wobble 

But i am afraid I am struggling with the regex to extract data like this.

Thank you

Upvotes: 3

Views: 71

Answers (1)

PaSTE
PaSTE

Reputation: 4548

For starters, the regex you gave shouldn't give you the starting path separators. Because you are not capturing the separator, you should be seeing something like this:

group 1 = 12345678901234567890123456789012  
group 2 = 1230345035
group 3 = wibble/wobble

It's a little easier to group together the last three elements into what you call group 2, then capture the first part of those last three elements into group 3 by using a compound capture group, like so:

\/([^\/]*)\/(([^\/]*)\/wibble\/wobble)

\/               # opening slash
([^\/]*)         # anything that is not a slash, repeated 0+ times, as group 1
\/               # separating slash
(                # begin group 2
([^\/]*)         # anything that is not a slash, repeated 0+ times, as group 3
\/wibble\/wobble # literal text to match
)                # end group 2

This should give you the following matches:

group 1 = 12345678901234567890123456789012  
group 2 = 1230345035/wibble/wobble
group 3 = 1230345035

Upvotes: 1

Related Questions