Peters7
Peters7

Reputation: 55

Extract numbers from multiple URLs

I want to only extract numbers from multiple urls.

Anyone know how to do this?

Thanks in advance.

Here's urls:

https://www.net-a-porter.com/en-gb/shop/product/loewe/bags/clutch-bags/flamenco-mini-leather-clutch/17411127376741793
https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/loulou-medium-quilted-leather-shoulder-bag/20346390236017004
https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/solferino-medium-leather-shoulder-bag/20346390236017013
https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/shoulder-bags/niki-baby-mini-quilted-crinkled-glossed-leather-shoulder-bag/20346390236017058
https://www.net-a-porter.com/en-gb/shop/product/loewe/bags/clutch-bags/flamenco-mini-leather-clutch/17411127376741793
https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/lou-medium-quilted-leather-shoulder-bag/20346390236017001
https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/solferino-small-leather-shoulder-bag/20346390236017074
https://www.net-a-porter.com/en-gb/shop/product/transience/bags/shoulder-bags/fortune-shell-shoulder-bag/16114163150620652
https://www.net-a-porter.com/en-gb/shop/product/jimmy-choo/bags/shoulder-bags/callie-tasseled-chainmail-trimmed-crinkled-leather-shoulder-bag/13452677151806735
https://www.net-a-porter.com/en-gb/shop/product/tom-ford/bags/cross-body/padlock-mini-textured-leather-shoulder-bag/13452677153399511

I want to get result like this:

17411127376741793
20346390236017004
20346390236017013
20346390236017058
17411127376741793
20346390236017001
20346390236017074
16114163150620652
13452677151806735
13452677153399511

Upvotes: 1

Views: 46

Answers (3)

noahtf13
noahtf13

Reputation: 333

Assuming it is in a list like so:

items_list = [
    'https://www.net-a-porter.com/en-gb/shop/product/loewe/bags/clutch-bags/flamenco-mini-leather-clutch/17411127376741793',
    'https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/loulou-medium-quilted-leather-shoulder-bag/20346390236017004',
    'https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/solferino-medium-leather-shoulder-bag/20346390236017013',
    'https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/shoulder-bags/niki-baby-mini-quilted-crinkled-glossed-leather-shoulder-bag/20346390236017058',
    'https://www.net-a-porter.com/en-gb/shop/product/loewe/bags/clutch-bags/flamenco-mini-leather-clutch/17411127376741793',
    'https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/lou-medium-quilted-leather-shoulder-bag/20346390236017001',
    'https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/solferino-small-leather-shoulder-bag/20346390236017074',
    'https://www.net-a-porter.com/en-gb/shop/product/transience/bags/shoulder-bags/fortune-shell-shoulder-bag/16114163150620652',
    'https://www.net-a-porter.com/en-gb/shop/product/jimmy-choo/bags/shoulder-bags/callie-tasseled-chainmail-trimmed-crinkled-leather-shoulder-bag/13452677151806735',
    'https://www.net-a-porter.com/en-gb/shop/product/tom-ford/bags/cross-body/padlock-mini-textured-leather-shoulder-bag/13452677153399511'
]

you just need get all numbers together:

num_values = []
for item in items_list:
    num_values.append(re.findall(r"[0-9]+",item)[0])
print(num_values)

Output:

['17411127376741793',
 '20346390236017004',
 '20346390236017013',
 '20346390236017058',
 '17411127376741793',
 '20346390236017001',
 '20346390236017074',
 '16114163150620652',
 '13452677151806735',
 '13452677153399511']

Upvotes: 1

nagyl
nagyl

Reputation: 1644

You can split the urls at each / sign and save the last part.

urls = [
    'https://www.net-a-porter.com/en-gb/shop/product/loewe/bags/clutch-bags/flamenco-mini-leather-clutch/17411127376741793',
    #...links
]

for url in urls:
    num = url.split('/')[-1] #last element
    print(num)

The downside of this method is that you need to copy each url into your code. There are ways to read excel files.

Upvotes: 2

It_is_Chris
It_is_Chris

Reputation: 14083

Try list comprehension with rsplit

l = ['https://www.net-a-porter.com/en-gb/shop/product/loewe/bags/clutch-bags/flamenco-mini-leather-clutch/17411127376741793',
'https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/loulou-medium-quilted-leather-shoulder-bag/20346390236017004',
'https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/solferino-medium-leather-shoulder-bag/20346390236017013',
'https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/shoulder-bags/niki-baby-mini-quilted-crinkled-glossed-leather-shoulder-bag/20346390236017058',
'https://www.net-a-porter.com/en-gb/shop/product/loewe/bags/clutch-bags/flamenco-mini-leather-clutch/17411127376741793',
'https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/lou-medium-quilted-leather-shoulder-bag/20346390236017001',
'https://www.net-a-porter.com/en-gb/shop/product/saint-laurent/bags/cross-body/solferino-small-leather-shoulder-bag/20346390236017074',
'https://www.net-a-porter.com/en-gb/shop/product/transience/bags/shoulder-bags/fortune-shell-shoulder-bag/16114163150620652',
'https://www.net-a-porter.com/en-gb/shop/product/jimmy-choo/bags/shoulder-bags/callie-tasseled-chainmail-trimmed-crinkled-leather-shoulder-bag/13452677151806735',
'https://www.net-a-porter.com/en-gb/shop/product/tom-ford/bags/cross-body/padlock-mini-textured-leather-shoulder-bag/13452677153399511']

[url.rsplit('/', 1)[1] for url in l]

['17411127376741793',
 '20346390236017004',
 '20346390236017013',
 '20346390236017058',
 '17411127376741793',
 '20346390236017001',
 '20346390236017074',
 '16114163150620652',
 '13452677151806735',
 '13452677153399511']

If you want, you can read the file using pandas

import pandas as pd
df = pd.read_excel('path/to/file.xlsx')
df['urls'].str.rsplit('/', 1, expand=True)[1] # assumes the column name is called urls

0    17411127376741793
1    20346390236017004
2    20346390236017013
3    20346390236017058
4    17411127376741793
5    20346390236017001
6    20346390236017074
7    16114163150620652
8    13452677151806735
9    13452677153399511
Name: 1, dtype: object

Upvotes: 3

Related Questions