Reputation: 201
I am writing a program in scrapy to scrape following page, https://www.trollandtoad.com/magic-the-gathering/aether-revolt/10066, and it is only scraping the first line of data and not the rest. I think it has something to do with my for loop but when I change the loop to be broader it outputs too much data, as in it output each line of data multiple times.
def parse(self, response):
item = GameItem()
saved_name = ""
for game in response.css("div.row.mt-1.list-view"):
saved_name = game.css("a.card-text::text").get() or saved_name
item["Card_Name"] = saved_name.strip()
if item["Card_Name"] != None:
saved_name = item["Card_Name"].strip()
else:
item["Card_Name"] = saved_name
yield item
UPDATE #1
def parse(self, response):
for game in response.css('div.card > div.row'):
item = GameItem()
item["Card_Name"] = game.css("a.card-text::text").get()
for buying_option in game.css('div.buying-options-table div.row:not(:first-child)'):
item["Condition"] = game.css("div.col-3.text-center.p-1::text").get()
item["Price"] = game.css("div.col-2.text-center.p-1::text").get()
yield item
Upvotes: 0
Views: 278
Reputation: 1035
You're code -- it does not work because you are creating the GameItem() outside of your list loop. I must have missed a postcard about this .get() and .getall() methods. Maybe someone can comment how its different from extract?
Your failing code
def parse(self, response):
item = GameItem() # this line right here only creates 1 game item per page
saved_name = ""
for game in response.css("div.row.mt-1.list-view"): # this line fails since it gets all the items on the page. This is a wrapper wrapping all the items inside of it. See below code for corrected selector.
saved_name = game.css("a.card-text::text").get() or saved_name
item["Card_Name"] = saved_name.strip()
if item["Card_Name"] != None:
saved_name = item["Card_Name"].strip()
else:
item["Card_Name"] = saved_name
yield item
Fixed code to solve your problem:
def parse(self, response):
for game in response.css("div.product-col"):
item = GameItem()
item["Card_Name"] = game.css("a.card-text::text").get()
if not item["Card_Name"]:
continue # this will skip to the next item if there is no card name, if there is a card name it will continue to yield the item. Another way of doing this would be to return nothing. Just "return". You only do this if you DO NOT want code after executed. If you want the code after to execute then use yeid.
yield item
Upvotes: 0
Reputation: 10666
I think you need below CSS (later you can use it as a base to process buying-options
container):
def parse(self, response):
for game in response.css('div.card > div.row'):
item = GameItem()
Card_Name = game.css("a.card-text::text").get()
item["Card_Name"] = Card_Name.strip()
for buying_option in game.css('div.buying-options-table div.row:not(:first-child)'):
# process buying-option
# may be you need to move GameItem() initialization inside this loop
yield item
As you can see I moved item = GameItem()
inside a loop. Also there is no need in saved_game
here.
Upvotes: 2
Reputation: 321
response.css("div.row.mt-1.list-view")
returns only 1 selector, so the code in your loop runs only once. Try this: for game in response.css(".mt-1.list-view .card-text"):
and you will get a list of selectors to loop over.
Upvotes: 0