Reputation: 2621
The program I am currently working on retrieves URLs from a website and puts them into a list. What I want to get is the last section of the URL.
So, if the first element in my list of URLs is "https://docs.python.org/3.4/tutorial/interpreter.html"
I would want to remove everything before "interpreter.html"
.
Is there a function, library, or regex I could use to make this happen? I've looked at other Stack Overflow posts but the solutions don't seem to work.
These are two of my several attempts:
for link in link_list:
file_names.append(link.replace('/[^/]*$',''))
print(file_names)
&
for link in link_list:
file_names.append(link.rpartition('//')[-1])
print(file_names)
Upvotes: 25
Views: 47389
Reputation: 315
Here's a more general, regex way of doing this:
re.sub(r'^.+/([^/]+)$', r'\1', "http://test.org/3/files/interpreter.html")
'interpreter.html'
Upvotes: 1
Reputation: 52071
Have a look at str.rsplit
.
>>> s = 'https://docs.python.org/3.4/tutorial/interpreter.html'
>>> s.rsplit('/',1)
['https://docs.python.org/3.4/tutorial', 'interpreter.html']
>>> s.rsplit('/',1)[1]
'interpreter.html'
And to use RegEx
>>> re.search(r'(.*)/(.*)',s).group(2)
'interpreter.html'
Then match the 2nd group which lies between the last /
and the end of String. This is a greedy usage of the greedy technique in RegEx.
Small Note - The problem with link.rpartition('//')[-1]
in your code is that you are trying to match //
and not /
. So remove the extra /
as in link.rpartition('/')[-1]
.
Upvotes: 54
Reputation: 49320
That doesn't need regex.
import os
for link in link_list:
file_names.append(os.path.basename(link))
Upvotes: 14
Reputation: 103744
You can use rpartition():
>>> s = 'https://docs.python.org/3.4/tutorial/interpreter.html'
>>> s.rpartition('/')
('https://docs.python.org/3.4/tutorial', '/', 'interpreter.html')
And take the last part of the 3 element tuple that is returned:
>>> s.rpartition('/')[2]
'interpreter.html'
Upvotes: 9
Reputation: 1153
This should work if you plan to use regex
for link in link_list:
file_names.append(link.replace('.*/',''))
print(file_names)
Upvotes: 0
Reputation: 1161
Just use string.split:
url = "/some/url/with/a/file.html"
print url.split("/")[-1]
# Result should be "file.html"
split gives you an array of strings that were separated by "/". The [-1] gives you the last element in the array, which is what you want.
Upvotes: 2