boot-scootin
boot-scootin

Reputation: 12515

String items in list: how to remove certain keywords?

I have a set of links that looks like the following:

links = ['http://www.website.com/category/subcategory/1',
'http://www.website.com/category/subcategory/2',
'http://www.website.com/category/subcategory/3',...]

I want to extract the 1, 2, 3, and so on from this list, and store the extracted data in subcategory_explicit. They're stored as str, and I'm having trouble getting at them with the following code:

subcategory_explicit = [cat.get('subcategory') for cat in links if cat.get('subcategory') is not None]

Do I have to change my data type from str to something else? What would be a better way to obtain and store the extracted values?

Upvotes: 0

Views: 43

Answers (2)

alex
alex

Reputation: 94

Try this (using re module):

import re

links = [
    'http://www.website.com/category/subcategory/1',
    'http://www.website.com/category/subcategory/2',
    'http://www.website.com/category/subcategory/3']

d = "|".join(links)
# 'http://www.website.com/category/subcategory/1|http://www.website.com/category/subcategory/2|http://www.website.com/category/subcategory/3'

pattern = re.compile("/category/(?P<category_name>\w+)/\d+", re.I)
subcategory_explicit = pattern.findall(d)

print(subcategory_explicit)

Upvotes: 1

Heman Gandhi
Heman Gandhi

Reputation: 1371

subcategory_explicit = [i[i.find('subcategory'):] for i in links if 'subcategory' in i]

This uses a substring via slicing, starting at the "s" in "subcategory" until the end of the string. By adding len('subcategory') to the value from find, you can exclude "subcategory" and get "/#" (where # is whatever number).

Upvotes: 1

Related Questions