Biscuitlove
Biscuitlove

Reputation: 31

Python scrape Coinmarketcap

I am tring to collect market cap data from coinmarketcap.com. In fact I successfully get the top 10 coins in marketcap, but it doesn't work after top 10 ( results become None).

Here is my code and I used Chrome.

    import requests        
    import time
    from bs4 import BeautifulSoup

    url = 'https://coinmarketcap.com/'
    strhtml = requests.get(url)
    soup = BeautifulSoup(strhtml.text, 'lxml')

    result={}
    baseAddr1 = '#__next > div.bywovg-1.sXmSU > div.main-content > div.sc-57oli2-0.comDep.cmc- 
    body-wrapper > div > div:nth-child(1) > div.h7vnx2-1.bFzXgL > table > tbody > '  //head of selector
    
    baseAddr3 = ' > td:nth-child(3) > div > a'  // end of selector

    for i in range(20):
        i+=1
        while i%10 == 0:
            time.sleep(3)
            print('resting...')
            break

        baseAddr2 = 'tr:nth-child(' + str(i) + ')'  // middle of selector, i for the order of coin
        Addr = baseAddr1 + baseAddr2 + baseAddr3  // full selector
        #print(Addr)

        data = soup.select(Addr)
        for item in data:
            result.update({item.get_text(): item.get('href')})

    print(result)

Thanks for your help!

Upvotes: 3

Views: 1081

Answers (1)

Ajax1234
Ajax1234

Reputation: 71451

The site first displays and then hides each row of coin data as you scroll down the page. To trigger this behavior and to grab each of the rows when they appear on scroll, you can use selenium. For the sake of speed, the answer below uses a small bit of Javascript, run via selenium, to pull the results:

from selenium import webdriver
from bs4 import BeautifulSoup as soup
import pandas as pd
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://coinmarketcap.com/')
results = d.execute_script('''
    window.scrollTo(0,document.body.scrollHeight)
    function* get_coin_data(){
        var h = Array.from(document.querySelectorAll('table.h7vnx2-2.bFpGkc.cmc-table thead th'))
        var hds = h.slice(1, h.length-2).map(x => x.textContent)
        for (var i of document.querySelectorAll('table.h7vnx2-2.bFpGkc.cmc-table tbody tr')){
             var n_hds = JSON.parse(JSON.stringify(hds))
             i.scrollIntoView()
             var tds = Array.from(i.querySelectorAll('td'))
             yield Object.fromEntries(tds.slice(1, tds.length-2).map(function(x){
                  return [n_hds.shift(), x.querySelector(':is(.etpvrL, .iworPT, .cLgOOr, .kAXKAX, .hzgCfk, .hykWbK, .kZlTnE)').textContent]
             }));
         }
    }
    return [...get_coin_data()]
''')
df = pd.DataFrame(results)

Output:

      #  24h %    7d %  ...          Name       Price      Volume(24h)
0     1  1.03%   1.05%  ...       Bitcoin  $48,678.16  $29,904,091,891
1     2  0.25%   1.20%  ...      Ethereum   $3,236.58  $15,197,663,099
2     3  0.86%  15.01%  ...       Cardano       $2.82   $6,389,958,677
3     4  1.94%   6.72%  ...  Binance Coin     $483.64   $1,850,753,287
4     5  0.03%   0.04%  ...        Tether       $1.00  $65,270,928,498
..  ...    ...     ...  ...           ...         ...              ...
95   96  2.08%   7.45%  ...      DigiByte    $0.06528      $24,887,122
96   97  2.33%  10.56%  ...       Horizen      $83.24      $57,256,134
97   98  0.06%   0.03%  ...    Pax Dollar     $0.9996      $86,915,502
98   99  0.02%   1.35%  ...      Ontology       $1.07     $123,632,824
99  100  1.34%   0.57%  ...          ICON       $1.40      $56,657,155

[100 rows x 8 columns]

Upvotes: 3

Related Questions