Reputation: 55
I want to only extract numbers from multiple urls.
Anyone know how to do this?
Thanks in advance.
Here's urls:
https://www.net-a-porter.com/en-gb/shop/product/loewe/bags/clutch-bags/flamenco-mini-leather-clutch/17411127376741793
https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/loulou-medium-quilted-leather-shoulder-bag/20346390236017004
https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/solferino-medium-leather-shoulder-bag/20346390236017013
https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/shoulder-bags/niki-baby-mini-quilted-crinkled-glossed-leather-shoulder-bag/20346390236017058
https://www.net-a-porter.com/en-gb/shop/product/loewe/bags/clutch-bags/flamenco-mini-leather-clutch/17411127376741793
https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/lou-medium-quilted-leather-shoulder-bag/20346390236017001
https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/solferino-small-leather-shoulder-bag/20346390236017074
https://www.net-a-porter.com/en-gb/shop/product/transience/bags/shoulder-bags/fortune-shell-shoulder-bag/16114163150620652
https://www.net-a-porter.com/en-gb/shop/product/jimmy-choo/bags/shoulder-bags/callie-tasseled-chainmail-trimmed-crinkled-leather-shoulder-bag/13452677151806735
https://www.net-a-porter.com/en-gb/shop/product/tom-ford/bags/cross-body/padlock-mini-textured-leather-shoulder-bag/13452677153399511
I want to get result like this:
17411127376741793
20346390236017004
20346390236017013
20346390236017058
17411127376741793
20346390236017001
20346390236017074
16114163150620652
13452677151806735
13452677153399511
Upvotes: 1
Views: 46
Reputation: 333
Assuming it is in a list like so:
items_list = [
'https://www.net-a-porter.com/en-gb/shop/product/loewe/bags/clutch-bags/flamenco-mini-leather-clutch/17411127376741793',
'https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/loulou-medium-quilted-leather-shoulder-bag/20346390236017004',
'https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/solferino-medium-leather-shoulder-bag/20346390236017013',
'https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/shoulder-bags/niki-baby-mini-quilted-crinkled-glossed-leather-shoulder-bag/20346390236017058',
'https://www.net-a-porter.com/en-gb/shop/product/loewe/bags/clutch-bags/flamenco-mini-leather-clutch/17411127376741793',
'https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/lou-medium-quilted-leather-shoulder-bag/20346390236017001',
'https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/solferino-small-leather-shoulder-bag/20346390236017074',
'https://www.net-a-porter.com/en-gb/shop/product/transience/bags/shoulder-bags/fortune-shell-shoulder-bag/16114163150620652',
'https://www.net-a-porter.com/en-gb/shop/product/jimmy-choo/bags/shoulder-bags/callie-tasseled-chainmail-trimmed-crinkled-leather-shoulder-bag/13452677151806735',
'https://www.net-a-porter.com/en-gb/shop/product/tom-ford/bags/cross-body/padlock-mini-textured-leather-shoulder-bag/13452677153399511'
]
you just need get all numbers together:
num_values = []
for item in items_list:
num_values.append(re.findall(r"[0-9]+",item)[0])
print(num_values)
Output:
['17411127376741793',
'20346390236017004',
'20346390236017013',
'20346390236017058',
'17411127376741793',
'20346390236017001',
'20346390236017074',
'16114163150620652',
'13452677151806735',
'13452677153399511']
Upvotes: 1
Reputation: 1644
You can split
the urls at each /
sign and save the last part.
urls = [
'https://www.net-a-porter.com/en-gb/shop/product/loewe/bags/clutch-bags/flamenco-mini-leather-clutch/17411127376741793',
#...links
]
for url in urls:
num = url.split('/')[-1] #last element
print(num)
The downside of this method is that you need to copy each url into your code. There are ways to read excel files.
Upvotes: 2
Reputation: 14083
Try list comprehension with rsplit
l = ['https://www.net-a-porter.com/en-gb/shop/product/loewe/bags/clutch-bags/flamenco-mini-leather-clutch/17411127376741793',
'https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/loulou-medium-quilted-leather-shoulder-bag/20346390236017004',
'https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/solferino-medium-leather-shoulder-bag/20346390236017013',
'https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/shoulder-bags/niki-baby-mini-quilted-crinkled-glossed-leather-shoulder-bag/20346390236017058',
'https://www.net-a-porter.com/en-gb/shop/product/loewe/bags/clutch-bags/flamenco-mini-leather-clutch/17411127376741793',
'https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/lou-medium-quilted-leather-shoulder-bag/20346390236017001',
'https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/solferino-small-leather-shoulder-bag/20346390236017074',
'https://www.net-a-porter.com/en-gb/shop/product/transience/bags/shoulder-bags/fortune-shell-shoulder-bag/16114163150620652',
'https://www.net-a-porter.com/en-gb/shop/product/jimmy-choo/bags/shoulder-bags/callie-tasseled-chainmail-trimmed-crinkled-leather-shoulder-bag/13452677151806735',
'https://www.net-a-porter.com/en-gb/shop/product/tom-ford/bags/cross-body/padlock-mini-textured-leather-shoulder-bag/13452677153399511']
[url.rsplit('/', 1)[1] for url in l]
['17411127376741793',
'20346390236017004',
'20346390236017013',
'20346390236017058',
'17411127376741793',
'20346390236017001',
'20346390236017074',
'16114163150620652',
'13452677151806735',
'13452677153399511']
If you want, you can read the file using pandas
import pandas as pd
df = pd.read_excel('path/to/file.xlsx')
df['urls'].str.rsplit('/', 1, expand=True)[1] # assumes the column name is called urls
0 17411127376741793
1 20346390236017004
2 20346390236017013
3 20346390236017058
4 17411127376741793
5 20346390236017001
6 20346390236017074
7 16114163150620652
8 13452677151806735
9 13452677153399511
Name: 1, dtype: object
Upvotes: 3