Reputation: 59
Here I want to store the data from the list given on a website page. If I'm running the commands
response.css('title::text').extract_first() and
response.css("article div#section-2 li::text").extract()
individually in the scrapy shell it is showing expected output in shell. Below is my code which is not storing data in json or csv format:
import scrapy
class QuotesSpider(scrapy.Spider):
name = "medical"
start_urls = ['https://medlineplus.gov/ency/article/000178.html/']
def parse(self, response):
yield
{
'topic': response.css('title::text').extract_first(),
'symptoms': response.css("article div#section-2 li::text").extract()
}
I have tried to run this code using
scrapy crawl medical -o medical.json
Upvotes: 1
Views: 251
Reputation: 473863
You need to fix your URL, it is https://medlineplus.gov/ency/article/000178.htm
and not https://medlineplus.gov/ency/article/000178.html/
.
Also, and more importantly, you need to define an Item
class and yield/return it from the parse()
callback of your spider:
import scrapy
class MyItem(scrapy.Item):
topic = scrapy.Field()
symptoms = scrapy.Field()
class QuotesSpider(scrapy.Spider):
name = "medical"
allowed_domains = ['medlineplus.gov']
start_urls = ['https://medlineplus.gov/ency/article/000178.htm']
def parse(self, response):
item = MyItem()
item["topic"] = response.css('title::text').extract_first()
item["symptoms"] = response.css("article div#section-2 li::text").extract()
yield item
Upvotes: 1