Dfhaa_DK
Dfhaa_DK

Reputation: 131

Web scraping using Selenium in python - trouble retrieving all data

I am trying to webscrape coinmarketcap.com using selenium where I am trying to retrieve data such as coin name, coinmarket cap, price and circulation supply. However, I am not successful with this. I am only able to retrieve 11 alt coins and not more. Also, I have looked into several ways how to render javascrip (which I presume coinmarketcap is made in) using different methods. Here is the start of my code:

driver = webdriver.Chrome(r'C:\Users\Ejer\PycharmProjects\pythonProject\chromedriver')
driver.get('https://coinmarketcap.com/')

Crypto = driver.find_elements_by_xpath("//div[contains(concat(' ', normalize-space(@class), ' '), 'sc-16r8icm-0 sc-1teo54s-1 lgwUsc')]")
#price = driver.find_elements_by_xpath('//td[@class="cmc-link"]')
#coincap = driver.find_elements_by_xpath('//td[@class="DAY"]')

CMC_list = []
for c in range(len(Crypto)):
    CMC_list.append(Crypto[c].text)
print(CMC_list)

driver.close()

My goal is to store the names, coinmarket cap, price and circulation supply in a dataframe so I can apply machine learning methods and analyze the data. So, I am open to any suggestions. Thank in advance

Upvotes: 0

Views: 884

Answers (2)

Gabriel Popa
Gabriel Popa

Reputation: 11

Facing the same problem, I added a page scrolling before Crypto = driver.find_elements_by_xpath... like this:

i=0
while i<15:
  driver.execute_script("window.scrollBy(0, window.innerHeight)")
  time.sleep(SCROLL_PAUSE_TIME)
  i+=1
Crypto = driver.find_elements_by_xpath('//div[@class="sc-16r8icm-0 sc-1teo54s-0 dBKWCw"]')

On my laptop, scrolling down the page for 13 times is enough to get refreshed all 100 coins. I put 15 just to be sure. The next step is to get the refreshed content. Perhaps I have to repeat scrolling every 1 or 2 minutes to get it. My first post here. Hard enough to insert the code. I hope it's useful

Upvotes: 1

undetected Selenium
undetected Selenium

Reputation: 193048

To retrieve the list of coin names you need to close the cookies bar, close the popup and induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR and get_attribute("innerHTML"):

    driver.get("https://coinmarketcap.com/")
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.cmc-cookie-policy-banner__close"))).click()
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button/b[text()='No, thanks']"))).click()
    print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table.cmc-table tbody tr td > a p[color='text']")))])
    
  • Using XPATH and text attribute:

    driver.get("https://coinmarketcap.com/")
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.cmc-cookie-policy-banner__close"))).click()
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button/b[text()='No, thanks']"))).click()
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[contains(@class, 'cmc-table')]//tbody//tr//td/a//p[@color='text']")))])
    driver.quit()
    
  • Console Output:

    ['Bitcoin', 'Ethereum', 'XRP', 'Tether', 'Litecoin', 'Bitcoin Cash', 'Chainlink', 'Cardano', 'Polkadot', 'Binance Coin', 'Stellar', 'USD Coin', 'Bitcoin SV']
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

Upvotes: 0

Related Questions