Reputation: 999
I have the following partial URL:
"/tennis/qatar/atp-doha-2009/hernych-jan-monfils-gael-S8Lm3D4l/"
I'd like to get everything after the second to last /
and excluding the final /
so:
"hernych-jan-monfils-gael-S8Lm3D4l"
I've got as far as:
re.search(r".*/(.*?/.*)", url)
Which gets me:
"hernych-jan-monfils-gael-S8Lm3D4l/"
But I can't figure out how to get rid of the final slash. Could someone point me in the right direction?
Upvotes: 1
Views: 775
Reputation: 204
For a more pythonic approach, you can also use:
"/tennis/qatar/atp-doha-2009/hernych-jan-monfils-gael-S8Lm3D4l/".split('/')[-2]
str.split outputs a list of words split using the provided delimiter (in this case, '/'
). So to break down the above statement,
s = "/tennis/qatar/atp-doha-2009/hernych-jan-monfils-gael-S8Lm3D4l/".split('/')
print(s)
>> ['', 'tennis', 'qatar', 'atp-doha-2009', 'hernych-jan-monfils-gael-S8Lm3D4l', '']
print(s[-2])
>> 'hernych-jan-monfils-gael-S8Lm3D4l'
Upvotes: 1
Reputation: 117
This would require another step but you could essentially make your string an object, and from there you could just use list comprehension to remove the last character from the string.
url = "hernych-jan-monfils-gael-S8Lm3D4l/"
url = url[:-1]
That would remove your string but I'm sure there are other ways that could do it in 1 line
Upvotes: 0
Reputation: 31619
You can do it like this:
s = "/tennis/qatar/atp-doha-2009/hernych-jan-monfils-gael-S8Lm3D4l/"
s = "/".join(s.split("/")[-2:]) # Equivalent to your regex, with replace
s = s.rstrip("/") # to remove the last slash
Upvotes: 1
Reputation: 18611
Use
^.*?/([^/]*)/?$
See proof.
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^/]* any character except: '/' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
/? '/' (optional (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
import re
regex = r"^.*?/([^/]*)/?$"
text = "/tennis/qatar/atp-doha-2009/hernych-jan-monfils-gael-S8Lm3D4l/"
print(re.findall(regex, text))
Result: ['hernych-jan-monfils-gael-S8Lm3D4l']
Upvotes: 3