Reputation: 11
I can not figure out how to scrape it, seems like the info is being hidden by Ng-show and after many attempts, nothing I found seems to work.
I want to scrape the product description and the shipping time
This is my current code:
from selenium import webdriver
from selenium.webdriver.common.by import By
# Set up the Chrome driver
driver = webdriver.Chrome()
# Navigate to the website
driver.get("https://cjdropshipping.com/product/silicone-grip-device-finger-exercise-stretcher-finger-gripper-strength-trainer-strengthen-rehabilitation-training-p-1614453269613522944.html?from=HTP")
# Find the element that contains the title of the product
title_element = driver.find_element(By.CSS_SELECTOR, 'div > div > div > div > div > div > pro-detail > div').get_attribute("textContent")
print(title_element)
# Extract the text from the element
title = title_element.text
# Print the title
print(title)
# Close the driver
driver.quit()
Upvotes: 0
Views: 103
Reputation: 3031
You need to wait for a few seconds for the target web elements or the contents on the page to load before you can find them.
[update] And You also need to scroll down up to the height of the description section to load the description information.
Here is the updated solution:
from time import sleep
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://cjdropshipping.com/product/silicone-grip-device-finger-exercise-stretcher-finger-gripper-strength-trainer-strengthen-rehabilitation-training-p-1614453269613522944.html?from=HTP")
WebDriverWait(driver, 2).until(EC.presence_of_element_located((By.ID, "pd-merchName")))
# scroll down in steps by window height 1000 to load the description
driver.execute_script("window.scrollBy(0, 1000);")
sleep(2)
soup = BeautifulSoup(driver.page_source, 'lxml')
title_element = soup.find('div', attrs={"id": "pd-merchName"}).text.strip()
print(title_element)
description1 = soup.find('div', attrs={"class": "pd-new-desc info-box"}).text.strip()
description2 = [i.text for i in soup.find('div', attrs={"id": "pd-description"}).find_all('p')]
print(description1)
print(description2)
Upvotes: 0
Reputation: 193048
To extract the Product Info ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:
Using CSS_SELECTOR and text attribute:
driver.get('https://cjdropshipping.com/product/silicone-grip-device-finger-exercise-stretcher-finger-gripper-strength-trainer-strengthen-rehabilitation-training-p-1614453269613522944.html?from=HTP')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div#subscribe-box > img"))).click()
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div#pd-merchName > div"))).text)
Using XPATH and get_attribute("innerHTML")
:
driver.get('https://cjdropshipping.com/product/silicone-grip-device-finger-exercise-stretcher-finger-gripper-strength-trainer-strengthen-rehabilitation-training-p-1614453269613522944.html?from=HTP')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div#subscribe-box > img"))).click()
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@id='pd-merchName']/div"))).get_attribute("innerHTML").strip())
Console Output:
Silicone Grip Device Finger Exercise Stretcher Finger Gripper Strength Trainer Strengthen Rehabilitation Training
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python
Link to useful documentation:
get_attribute()
method Gets the given attribute or property of the element.
text
attribute returns The text of the element.
Upvotes: 0