Reputation: 75
I want to scrape this web page using selenium in Python:https://www.lelo.com/es/juguetes-sexuales-para-parejas.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import pandas as pd
from selenium.webdriver.common.action_chains import ActionChains
import time
from tqdm import tqdm
from selenium.common.exceptions import NoSuchElementException
driver.get('https://www.lelo.com/es/juguetes-sexuales-para-parejas/')
I identified only the visible links at this page by using this code:
masa_perso_flist = driver.find_elements_by_xpath('//div[@class="views-field views-field-rendered-
entity"]')
filtered_links = [link for link in masa_perso_flist if link.is_displayed()]
listOflinks = []
for masa in filtered_links:
ppp1=masa.find_element_by_tag_name('div')
ppp2=masa.find_element_by_tag_name('a')
listOflinks.append(ppp2.get_property('href'))
For each product I openned the link from listOflinks and tried to extract the name, description, price, number of reviews, and average review for each product. I found that the elements are not similar across the pages of the products for capturing the information I am interested in. In the case of name and description, for example, there are two possible routes (XPath) for extracting the information and I’ve been successful in doing it. However, I’m struggling for capturing the price. I tried with this code in the case of price:
alldetails = []
for i in tqdm(listOflinks):
driver.get(i)
try:
Precio = driver.find_element_by_xpath('.//td[@class= "price-amount"] |.//table[@class= "price-amount"]').text
# I also tried: Precio = driver.find_element_by_xpath('.//td[@class= "price-amount"] |.//tr[@class="price-label"]').text
except NoSuchElementException:
Precio = ("No prices")
tempJb = {'Precios': Precio}
alldetails.append(tempJb)
print(alldetails)
This is my output:
[{'Price': '169.00 USD'}, {'Price': ''}, {'Price': ''}, {'Price': ''}, {'Price': ''}, {'Price': ''}]
If my code is wrong why I'm not getting a error message? Why do I get {'Price': ''} instead of {'Price': 'No prices'} Probably it is a silly question, but I really will appreciate your help in my learning to develop an appropriate code for this case. I've tried multiple combinations of XPaths for capturing the price information but I'm still failling in my purpose. Thanks a lot.
Upvotes: 0
Views: 26
Reputation: 9969
Try the following using get_attribute('textContent')
get_attribute('textContent') vs .text
Will grab the data if it's hidden or otherwise.
driver.get('https://www.lelo.com/es/juguetes-sexuales-para-parejas/')
masa_perso_flist = driver.find_elements_by_xpath('//div[@class="views-field views-field-rendered-entity"]')
filtered_links = [link for link in masa_perso_flist if link.is_displayed()]
listOflinks = []
for masa in filtered_links:
ppp1=masa.find_element_by_tag_name('div')
ppp2=masa.find_element_by_tag_name('a')
listOflinks.append(ppp2.get_property('href'))
alldetails = []
for i in tqdm(listOflinks):
driver.get(i)
try:
Precio = driver.find_element_by_xpath('.//td[@class= "price-amount"] |.//table[@class= "price-amount"]').get_attribute('textContent')
# I also tried: Precio = driver.find_element_by_xpath('.//td[@class= "price-amount"] |.//tr[@class="price-label"]').text
except NoSuchElementException:
Precio = "No prices"
tempJb = {'Precios': Precio}
alldetails.append(tempJb)
print(alldetails)
I don't have tqdm but the output looks correct.
Outputs:
[{'Precios': '$229.00'}]
[{'Precios': '$229.00'}, {'Precios': '$539.00'}]
[{'Precios': '$229.00'}, {'Precios': '$539.00'}, {'Precios': '$219.00'}]
[{'Precios': '$229.00'}, {'Precios': '$539.00'}, {'Precios': '$219.00'}, {'Precios': '$209.00'}]
[{'Precios': '$229.00'}, {'Precios': '$539.00'}, {'Precios': '$219.00'}, {'Precios': '$209.00'}, {'Precios': '$249.00'}]
[{'Precios': '$229.00'}, {'Precios': '$539.00'}, {'Precios': '$219.00'}, {'Precios': '$209.00'}, {'Precios': '$249.00'}, {'Precios': '$259.00'}]
Upvotes: 1