Reputation: 201
Trying to scrape prices off a website but certain prices are crossed out and a new price is shown so I was getting null for those prices. Well, I figured I could set up an if statement in order to get right price well it kind of worked. But instead of getting the new price I get the crossed out price because the identifiers are the same for both. Any ideas on how to fix this?
for game in response.css("tr[class^=deckdbbody]"):
# Initialize saved_name to the extracted card name
saved_name = game.css("a.card_popup::text").extract_first() or saved_name
# Now call item and set equal to saved_name and strip leading '\n' from output
item["Card_Name"] = saved_name.strip()
# Check to see if output is null, in the case that there are two different conditions for one card
if item["Card_Name"] != None:
# If not null than store value in saved_name
saved_name = item["Card_Name"].strip()
# If null then set null value to previous card name since if there is a null value you should have the same card name twice
else:
item["Card_Name"] = saved_name
# Call item again in order to extract the condition, stock, and price using the corresponding html code from the website
item["Condition"] = game.css("td[class^=deckdbbody].search_results_7 a::text").get()
item["Stock"] = game.css("td[class^=deckdbbody].search_results_8::text").extract_first()
item["Price"] = game.css("td[class^=deckdbbody].search_results_9::text").extract_first()
if item["Price"] == None:
item["Price"] = game.css("td[class^=deckdbbody].search_results_9 span::text").get()
# Return values
yield item
Upvotes: 0
Views: 427
Reputation: 201
Here is what ended up working
if item["Price"] == None:
item["Price"] = game.css("td[class^=deckdbbody].search_results_9 span[style*='color:red']::text").get()
Upvotes: 0
Reputation: 1049
You need to scrape it considering the style tag style="text-decoration:line-through"
is for the prices you do not want.
For that you could use BeautifulSoup and considering the prices that are not crossed has no style tag:
from bs4 import BeautifulSoup as bs
import requests as r
response = r.get(url)
soup = bs(response.content)
decks = bs.find_all('td', {'class': 'deckdbbody', 'style': None})
Now get the text content inside each one, which is the price:
prices = [d.getText().strip() for d in decks]
With your update, I can see you will get unwanted things inside prices
list because a lot of td
uses this class and is not even a price, an easy way to fix would be checking if you have a dollar sign in the .getText()
:
final = []
for price in prices:
if '$' in price:
final.append(price)
Now final
only has what you really want.
Upvotes: 1