Python Selenium (Scrape Product Info from CJDropship)

I can not figure out how to scrape it, seems like the info is being hidden by Ng-show and after many attempts, nothing I found seems to work.

Website: https://cjdropshipping.com/product/silicone-grip-device-finger-exercise-stretcher-finger-gripper-strength-trainer-strengthen-rehabilitation-training-p-1614453269613522944.html?from=HTP

I want to scrape the product description and the shipping time

This is my current code:

from selenium import webdriver
from selenium.webdriver.common.by import By


# Set up the Chrome driver
driver = webdriver.Chrome()

# Navigate to the website
driver.get("https://cjdropshipping.com/product/silicone-grip-device-finger-exercise-stretcher-finger-gripper-strength-trainer-strengthen-rehabilitation-training-p-1614453269613522944.html?from=HTP")

# Find the element that contains the title of the product
title_element = driver.find_element(By.CSS_SELECTOR, 'div > div > div > div > div > div > pro-detail > div').get_attribute("textContent")
print(title_element)
# Extract the text from the element
title = title_element.text

# Print the title
print(title)

# Close the driver
driver.quit()

Upvotes: 0

Answers (2)

Ajeet Verma

Reputation: 3031

You need to wait for a few seconds for the target web elements or the contents on the page to load before you can find them.

[update] And You also need to scroll down up to the height of the description section to load the description information.

Here is the updated solution:

from time import sleep
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()

driver.get("https://cjdropshipping.com/product/silicone-grip-device-finger-exercise-stretcher-finger-gripper-strength-trainer-strengthen-rehabilitation-training-p-1614453269613522944.html?from=HTP")
WebDriverWait(driver, 2).until(EC.presence_of_element_located((By.ID, "pd-merchName")))

# scroll down in steps by window height 1000 to load the description
driver.execute_script("window.scrollBy(0, 1000);")
sleep(2)

soup = BeautifulSoup(driver.page_source, 'lxml')
title_element = soup.find('div', attrs={"id": "pd-merchName"}).text.strip()
print(title_element)

description1 = soup.find('div', attrs={"class": "pd-new-desc info-box"}).text.strip()
description2 = [i.text for i in soup.find('div', attrs={"id": "pd-description"}).find_all('p')]

print(description1)
print(description2)

Upvotes: 0

undetected Selenium

Reputation: 193048

To extract the Product Info ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

Using CSS_SELECTOR and text attribute:

driver.get('https://cjdropshipping.com/product/silicone-grip-device-finger-exercise-stretcher-finger-gripper-strength-trainer-strengthen-rehabilitation-training-p-1614453269613522944.html?from=HTP')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div#subscribe-box > img"))).click()
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div#pd-merchName > div"))).text)

Using XPATH and get_attribute("innerHTML"):

driver.get('https://cjdropshipping.com/product/silicone-grip-device-finger-exercise-stretcher-finger-gripper-strength-trainer-strengthen-rehabilitation-training-p-1614453269613522944.html?from=HTP')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div#subscribe-box > img"))).click()
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@id='pd-merchName']/div"))).get_attribute("innerHTML").strip())

Console Output:

Silicone Grip Device Finger Exercise Stretcher Finger Gripper Strength Trainer Strengthen Rehabilitation Training

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

References

Link to useful documentation:

get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium

Upvotes: 0

Python Selenium (Scrape Product Info from CJDropship)

Answers (2)

References

Related Questions