Christian P.
Christian P.

Reputation: 4884

Pattern matching issue

I have the following URL path:

I wish to capture the different segments. Everything up and including the .mp4 is fairly easy, but it gets tricky after that with the following sub-segment:

media_u11bgy04l_b282848_qdGltZT0xMzgwMjA0ODMzJnNlc3Npb249MjE2ODcxNzI3NTc=.abst/Seg1-Frag74

I wish to capture this so I have three matches:

  1. media_u11bgy04l_b282848_qdGltZT0xMzgwMjA0ODMzJnNlc3Npb249MjE2ODcxNzI3NTc=
  2. .abst
  3. /Seg1-Frag74

The idea is that #2 can be different formats (it's for livestreaming, so we have .f4m and .m3u8) and #1 is basically something I just need to skip. #3 is optional (not always present), so it must match even if nothing follows #2.

I have tried the following: (.*?)(\.abst|\.f4m|\.m3u8)?(.*)

But the result is the following (I am using python, hence the None):

  1. '' (empty string)
  2. None
  3. media_u11bgy04l_b282848_qdGltZT0xMzgwMjA0ODMzJnNlc3Npb249MjE2ODcxNzI3NTc=.abst/Seg1-Frag74

If I change it to the following, (.*)(\.abst|\.f4m|\.m3u8)?(.*), I get:

  1. media_u11bgy04l_b282848_qdGltZT0xMzgwMjA0ODMzJnNlc3Npb249MjE2ODcxNzI3NTc=.abst/Seg1-Frag74
  2. None
  3. '' (empty string)

The 2nd part is optional because we want to capture unexpected input (and throw an error so we can investigate) in case of malformed requests or something we missed (where it's not one of the pre-specified playlist types or similar).

I am open to using a non-regex solution, I am just unsure about how to aproach this. Any help is appreciated.

Upvotes: 1

Views: 149

Answers (2)

Jerry
Jerry

Reputation: 71538

You can perhaps try something like...

r'(.*?)(\.[^/]+)(.*)'

[^/]+ will allow you to get different extensions as well. If you want to get only those you mentioned, just use (\.abst|\.f4m|\.m3u8) instead of (\.[^/]+) (don't put back the ?)


The ? in your regex was preventing the correct match:

(.*?)(\.abst|\.f4m|\.m3u8)?(.*)

Here, at the start of the string, (.*?) will attempt to match none, and (\.abst|\.f4m|\.m3u8)? also succeeds to have a match (null) at the same point, i.e. at the start of the string.

(.*)(\.abst|\.f4m|\.m3u8)?(.*)

Here, (.*) is greedy and you end up at the end of the string and attempt to match (\.abst|\.f4m|\.m3u8)? again succeeds to have a match (null) there.

Upvotes: 1

Toto
Toto

Reputation: 91385

Don't make the second group optional, and there're no needs to capture groups 1 and 3:

.*?(\.abst|\.f4m|\.m3u8).*?

Upvotes: 1

Related Questions