Reputation: 4884
I have the following URL path:
I wish to capture the different segments. Everything up and including the .mp4
is fairly easy, but it gets tricky after that with the following sub-segment:
media_u11bgy04l_b282848_qdGltZT0xMzgwMjA0ODMzJnNlc3Npb249MjE2ODcxNzI3NTc=.abst/Seg1-Frag74
I wish to capture this so I have three matches:
media_u11bgy04l_b282848_qdGltZT0xMzgwMjA0ODMzJnNlc3Npb249MjE2ODcxNzI3NTc=
.abst
/Seg1-Frag74
The idea is that #2 can be different formats (it's for livestreaming, so we have .f4m
and .m3u8
) and #1 is basically something I just need to skip. #3 is optional (not always present), so it must match even if nothing follows #2.
I have tried the following: (.*?)(\.abst|\.f4m|\.m3u8)?(.*)
But the result is the following (I am using python, hence the None
):
If I change it to the following, (.*)(\.abst|\.f4m|\.m3u8)?(.*)
, I get:
The 2nd part is optional because we want to capture unexpected input (and throw an error so we can investigate) in case of malformed requests or something we missed (where it's not one of the pre-specified playlist types or similar).
I am open to using a non-regex solution, I am just unsure about how to aproach this. Any help is appreciated.
Upvotes: 1
Views: 149
Reputation: 71538
You can perhaps try something like...
r'(.*?)(\.[^/]+)(.*)'
[^/]+
will allow you to get different extensions as well. If you want to get only those you mentioned, just use (\.abst|\.f4m|\.m3u8)
instead of (\.[^/]+)
(don't put back the ?
)
The ?
in your regex was preventing the correct match:
(.*?)(\.abst|\.f4m|\.m3u8)?(.*)
Here, at the start of the string, (.*?)
will attempt to match none, and (\.abst|\.f4m|\.m3u8)?
also succeeds to have a match (null) at the same point, i.e. at the start of the string.
(.*)(\.abst|\.f4m|\.m3u8)?(.*)
Here, (.*)
is greedy and you end up at the end of the string and attempt to match (\.abst|\.f4m|\.m3u8)?
again succeeds to have a match (null) there.
Upvotes: 1
Reputation: 91385
Don't make the second group optional, and there're no needs to capture groups 1 and 3:
.*?(\.abst|\.f4m|\.m3u8).*?
Upvotes: 1