Reputation: 2897
I am extracting content with the help of scrapy into an array. Each element has the unwanted characters ": " inside which I would like to remove as efficient as possible.
v = response.xpath('//div[@id="tab"]/text()').extract()
>>> v
['Marke:', 'Modell:']
>>> for i in v : re.sub(r'[^\w]', '', i)
...
'Marke'
'Modell'
Now that seems to work, but how can I retain the result?
In my code, v
hasn't changed:
>>> v
['Marke:', 'Modell:']
Upvotes: 1
Views: 52
Reputation: 77837
I think that pulling in regex
for this is a little overkill: use the string replace
method:
v = ['Marke:', 'Modell:']
v = [str.replace(':', '') for str in v]
print(v)
Output:
['Marke', 'Modell']
Upvotes: 1
Reputation: 19654
You can solve this with a list comprehension:
>>> v = response.xpath('//div[@id="tab"]/text()').extract()
>>>
>>> import re
>>> v = [re.sub(r'[^\w]', '', i) for i in v]
>>> v
['Marke', 'Modell']
Upvotes: 3