Reputation: 899
I want to get text by scrapy from a website. This is sample code:
def parse(self, response):
for kamusset in response.css("div#d1"):
text = kamusset.css("div b::text").extract()
print(dict(text=text))
I want to remove the '.' symbol and every number symbol. So, I use regular expression. I change my code:
def parse(self, response):
for kamusset in response.css("div#d1"):
text = kamusset.css("div b::text").re(r'[a-z]+')
print(dict(text=text))
I don't expect the result like that. I want to get like this:
{'text': ['abadi', 'mengabadi', 'mengabadikan', 'pengabadian', 'keabadian']}. How to do that?
Upvotes: 0
Views: 238
Reputation: 6556
You can parse from text
you scraped with re
:
import re
text = ['aba.di','meng.a.ba.di','megn.a.ba.di.kan','1','2','peng.a.ba.di.an','ke.a.ba.di.an','1','2']
stack = [re.sub('[^a-zA-Z]+', '', e) for e in text]
text_new = [i for i in stack if i!=""]
print(text_new)
text_new will be:
['abadi', 'mengabadi', 'megnabadikan', 'pengabadian', 'keabadian']
Upvotes: 1