Reputation: 31
I am tring to collect market cap data from coinmarketcap.com. In fact I successfully get the top 10 coins in marketcap, but it doesn't work after top 10 ( results become None).
Here is my code and I used Chrome.
import requests
import time
from bs4 import BeautifulSoup
url = 'https://coinmarketcap.com/'
strhtml = requests.get(url)
soup = BeautifulSoup(strhtml.text, 'lxml')
result={}
baseAddr1 = '#__next > div.bywovg-1.sXmSU > div.main-content > div.sc-57oli2-0.comDep.cmc-
body-wrapper > div > div:nth-child(1) > div.h7vnx2-1.bFzXgL > table > tbody > ' //head of selector
baseAddr3 = ' > td:nth-child(3) > div > a' // end of selector
for i in range(20):
i+=1
while i%10 == 0:
time.sleep(3)
print('resting...')
break
baseAddr2 = 'tr:nth-child(' + str(i) + ')' // middle of selector, i for the order of coin
Addr = baseAddr1 + baseAddr2 + baseAddr3 // full selector
#print(Addr)
data = soup.select(Addr)
for item in data:
result.update({item.get_text(): item.get('href')})
print(result)
Thanks for your help!
Upvotes: 3
Views: 1081
Reputation: 71451
The site first displays and then hides each row of coin data as you scroll down the page. To trigger this behavior and to grab each of the rows when they appear on scroll, you can use selenium
. For the sake of speed, the answer below uses a small bit of Javascript, run via selenium
, to pull the results:
from selenium import webdriver
from bs4 import BeautifulSoup as soup
import pandas as pd
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://coinmarketcap.com/')
results = d.execute_script('''
window.scrollTo(0,document.body.scrollHeight)
function* get_coin_data(){
var h = Array.from(document.querySelectorAll('table.h7vnx2-2.bFpGkc.cmc-table thead th'))
var hds = h.slice(1, h.length-2).map(x => x.textContent)
for (var i of document.querySelectorAll('table.h7vnx2-2.bFpGkc.cmc-table tbody tr')){
var n_hds = JSON.parse(JSON.stringify(hds))
i.scrollIntoView()
var tds = Array.from(i.querySelectorAll('td'))
yield Object.fromEntries(tds.slice(1, tds.length-2).map(function(x){
return [n_hds.shift(), x.querySelector(':is(.etpvrL, .iworPT, .cLgOOr, .kAXKAX, .hzgCfk, .hykWbK, .kZlTnE)').textContent]
}));
}
}
return [...get_coin_data()]
''')
df = pd.DataFrame(results)
Output:
# 24h % 7d % ... Name Price Volume(24h)
0 1 1.03% 1.05% ... Bitcoin $48,678.16 $29,904,091,891
1 2 0.25% 1.20% ... Ethereum $3,236.58 $15,197,663,099
2 3 0.86% 15.01% ... Cardano $2.82 $6,389,958,677
3 4 1.94% 6.72% ... Binance Coin $483.64 $1,850,753,287
4 5 0.03% 0.04% ... Tether $1.00 $65,270,928,498
.. ... ... ... ... ... ... ...
95 96 2.08% 7.45% ... DigiByte $0.06528 $24,887,122
96 97 2.33% 10.56% ... Horizen $83.24 $57,256,134
97 98 0.06% 0.03% ... Pax Dollar $0.9996 $86,915,502
98 99 0.02% 1.35% ... Ontology $1.07 $123,632,824
99 100 1.34% 0.57% ... ICON $1.40 $56,657,155
[100 rows x 8 columns]
Upvotes: 3