Tim
Tim

Reputation: 201

Difficulty scraping some prices off a website

Trying to scrape prices off a website but certain prices are crossed out and a new price is shown so I was getting null for those prices. Well, I figured I could set up an if statement in order to get right price well it kind of worked. But instead of getting the new price I get the crossed out price because the identifiers are the same for both. Any ideas on how to fix this? Screenshot of HTML code html code of not corssed out price

  for game in response.css("tr[class^=deckdbbody]"):

            # Initialize saved_name to the extracted card name
            saved_name  = game.css("a.card_popup::text").extract_first() or saved_name
            # Now call item and set equal to saved_name and strip leading '\n' from output
            item["Card_Name"] = saved_name.strip()
            # Check to see if output is null, in the case that there are two different conditions for one card
            if item["Card_Name"] != None:
                # If not null than store value in saved_name
                saved_name = item["Card_Name"].strip()
            # If null then set null value to previous card name since if there is a null value you should have the same card name twice
            else:
                item["Card_Name"] = saved_name
            # Call item again in order to extract the condition, stock, and price using the corresponding html code from the website
            item["Condition"] = game.css("td[class^=deckdbbody].search_results_7 a::text").get()
            item["Stock"] = game.css("td[class^=deckdbbody].search_results_8::text").extract_first()
            item["Price"] = game.css("td[class^=deckdbbody].search_results_9::text").extract_first()
            if item["Price"] == None:
                item["Price"] = game.css("td[class^=deckdbbody].search_results_9 span::text").get()

            # Return values
            yield item

Upvotes: 0

Views: 427

Answers (2)

Tim
Tim

Reputation: 201

Here is what ended up working

if item["Price"] == None:
    item["Price"] = game.css("td[class^=deckdbbody].search_results_9 span[style*='color:red']::text").get()

Upvotes: 0

Vitor Falcão
Vitor Falcão

Reputation: 1049

You need to scrape it considering the style tag style="text-decoration:line-through" is for the prices you do not want.

For that you could use BeautifulSoup and considering the prices that are not crossed has no style tag:

from bs4 import BeautifulSoup as bs
import requests as r

response = r.get(url)
soup = bs(response.content)
decks = bs.find_all('td', {'class': 'deckdbbody', 'style': None})   

Now get the text content inside each one, which is the price:

prices = [d.getText().strip() for d in decks]

With your update, I can see you will get unwanted things inside prices list because a lot of td uses this class and is not even a price, an easy way to fix would be checking if you have a dollar sign in the .getText():

final = []
for price in prices:
    if '$' in price:
        final.append(price)

Now final only has what you really want.

Upvotes: 1

Related Questions