Reputation: 59
In Python regex, how would I match only the facebook.com...777
substrings given either string? I don't want the ?sfnsn=mo
at the end.
I have (?<=https://m\.)([^\s]+)
to match everything after the https://m.
. I also have (?=\?sfnsn)
to match every thing in front of ?sfnsn
.
How do I combine the regex to only return the facebook.com...777 part for either string.
have: https://m.facebook.com/story.php?story_fbid=123456789&id=7777777777?sfnsn=mo
want: facebook.com/story.php?story_fbid=123456789&id=7777777777
have: https://m.facebook.com/story.php?story_fbid=123456789&id=7777777777
want: facebook.com/story.php?story_fbid=123456789&id=7777777777
Here's what I was messing around with https://regex101.com/r/WYz5dn/2
(?<=https://m\.)([^\s]+)(?=\?sfnsn)
Upvotes: 0
Views: 310
Reputation: 2022
Putting a ?
at the end works, since the last grouped lookahead may or may not exist, we put a question mark after it:
(?<=https://m\.)([^\s]+)(?=\?sfnsn)?
Upvotes: -1
Reputation: 163207
You could use a capturing group instead of a positive lookbehind and match either ?sfnsn
or the end of the string.
https://m\.(\S*?)(?:\?sfnsn|$)
Using the lookarounds, the pattern could be:
(?<=https://m\.)\S*?(?=\?sfnsn|$)
Upvotes: 2