Reputation: 899
I want to crawl data from a website. I use this code
import scrapy
class KamusSetSpider(scrapy.Spider):
name = "kamusset_spider"
start_urls = ['http://kbbi.web.id/abadi']
def parse(self, response):
SET_SELECTOR = '.tur highlight'
for brickset in response.css(SET_SELECTOR):
yield {
'name': brickset.css(SET_SELECTOR).extract_first(),
}
and this is the inspect element:
I want to get every text in the red oval, like mengabadi, mengabadikan, etc. There are multiple class in the 'b' tag => tur highlight. But, I have not get any result.
What's the problem? How to solve it? I have change my code become this:
def parse(self, response):
for kamusset in response.css("div#d1"):
text = kamusset.css("div.sub_17 b.tur.highlight::text").extract()
print(dict(text=text))
but still not working. It return null.
Upvotes: 3
Views: 7232
Reputation: 31
As far I understood your question, you want to extract text from the different place (tags) with different class names in single css_selector.
text = kamusset.css("div.sub_17::text, b.tur.highlight::text").extract()
This will work, surely
Upvotes: 2
Reputation: 1529
This will do!!
text : Selector(text=response.body).xpath('//b[@class="tur highlight"]/text()').extract()
Will return list with all occurrences.
Upvotes: 0
Reputation: 12845
Selector .tur highlight
means - select elements highlight
inside all elements with class tur
.
To select elements with multiple classes use selector without whitespaces :
SET_SELECTOR = '.tur.highlight'
Upvotes: 4