Reputation: 392
I want to pull the product name at this site: https://shopee.com.my/search?keyword=h370m I've received support of @DebanjanB at this question Selenium can not scrape Shopee e-commerce site using python but I am not able to apply the xpath of product name into that solution. Here is my code:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('start-maximized')
options.add_argument('disable-infobars')
options.add_argument('--disable-extensions')
browserdriver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Users\\admin\\Desktop\\chromedriver_win32\\Chromedriver')
browserdriver.get('https://shopee.com.my/search?keyword=h370m')
WebDriverWait(browserdriver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[@class='shopee-modal__container']//button[text()='English']"))).click()
print([my_element.text for my_element in WebDriverWait(browserdriver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, ".//*[@class='_1JAmkB']")))])
print("Program Ended")
Also, I tried different xpath, such as:
By.XPATH, ".//*[@class='_1JAmkB']/child::div"
or
//div[contains(concat(' ', normalize-space(@class), ' '), ' _1NoI8_ ')]
Neither of them can give me the result as expected
The output I received was just:
['', '', '', '', '', '', '', '', '', '', '', '', '', '', ''] Program Ended
Please help me to solve this problem. Thanks!
Upvotes: 2
Views: 800
Reputation: 84465
XPath:
You can use this xpath and also you need the innerHTML (not .text)
//*[@class="_1NoI8_ _2gr36I"]
And then extract the innerHTML.
print([my_element.get_attribute('innerHTML') for my_element in WebDriverWait(browserdriver, 10).until(EC.presence_of_all_elements_located((By.XPATH, '//*[@class="_1NoI8_ _2gr36I"]')))])
CSS:
print([my_element.get_attribute('innerHTML') for my_element in WebDriverWait(browserdriver, 10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "._1NoI8_._2gr36I")))])
API:
I still think the API is better. I showed using that here. I get the names and prices each time so unsure about the issue over time you had (though I don't know how many times you have run it). With the API you don't need to scroll to generate all results.
With a short wait you can extract all data also from script tags on page:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
import json
browserdriver = webdriver.Chrome()
browserdriver.get('https://shopee.com.my/search?keyword=h370m')
WebDriverWait(browserdriver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[@class='shopee-modal__container']//button[text()='English']"))).click()
time.sleep(2)
products = [item for item in WebDriverWait(browserdriver, 10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '[type="application/ld+json"]')))]
products_json = [product.get_attribute('innerHTML') for product in products[1:]]
names = [json.loads(product)['name'] for product in products_json] #just showing name extraction from json
len(names)
Upvotes: 2