Reputation: 23
Well, I have been trying this for a few days. I have to say that I am very new to Python and I don't fully understand Javascript (for now), so maybe is something stupid but I can't figure it out.
I wan't to scrape the different tables on BSCScan, because of simplicity and because all of them are almost the same I will show the code of holders. I want to store it in a dict, then append to list of dicts and convert to data frame with pandas, with Address, quantity and percentage: This is the approach I made with html_requests:
from requests_html import HTMLSession
contract = "0x84c0160d55a05a28a034e1e6776f84c5995aba3a"
url = ("https://bscscan.com/token/" + contract + "#balances")
session = HTMLSession()
r = session.get(url)
r.html.render(sleep=3)
print(r.content)
holders = r.html.xpath('//td', first = True)
print(holders) #This returns None
This is the code I was making with requests_html, and with Selenium this is it:
driver_path = "/Users/XXX/Downloads/chromedriver"
driver = webdriver.Chrome(driver_path)
def get_top_holders(driver, holders):
list_holders = []
driver.get(holders)
tabla = driver.find_element_by_xpath('.//tbody').get_attribute('innerHTML') #I tried innerHTML just to see if it works in that way.
for td in tabla.find_elements_by_xpath('.//tr'):
name = td.find_element_by_xpath('.//td[2]/span/a').get_attribute('textContent')
quantity = td.find_element_by_xpath('.//td[3]').get_attribute('textContent')
percentage = td.find_element_by_xpath('//td[4]/text')
dict = {
'Addres': name,
'Cantidad': quantity,
'Porcentaje': percentage
}
os.system('clear')
print(dict)
driver.close()
list_holders.append(dict)
print(list_holders)
holders_tabla = pd.DataFrame(list_holders)
return holders_tabla
I have tried with Selenium, letting it render and trying to extract, but I can't iterate from tbody. I have tried with Beautiful Soup but I don't get it completely and someone recommend me requests_html but it is returning none.
First time asking, thanks in advance!
Upvotes: 2
Views: 760
Reputation: 195508
You can use their API URL to get the data. For example (I'm using beautifulsoup
to parse the result, but you could use different parser):
import requests
from bs4 import BeautifulSoup
api_url = "https://bscscan.com/token/generic-tokenholders2"
params = {
"m": "normal",
"a": "0x84c0160d55a05a28a034e1e6776f84c5995aba3a",
"p": "1",
}
soup = BeautifulSoup(
requests.get(api_url, params=params).content, "html.parser"
)
for row in soup.select("tr:has(td)"):
tds = [td.get_text(strip=True) for td in row.select("td")]
print(*tds)
Prints:
1 Burn Address 100,000,000 0.0000%
2 PancakeSwap V2: COSP 57,731,808.58566958 0.0000%
3 0xb545108f3ee074de830a21e09c6b9cce87b43edb 1,999,989.75962961 0.0000%
4 0x5015219bbf9b1bacc912523f002e28d044942e90 1,983,225.07328063 0.0000%
5 0x8081b65976ff685f9cd14ac7bd19ba2d324ffe68 1,945,354.7303935 0.0000%
6 0x8806048c354ab50ed075f1ab1977c6dbd87fdc69 1,876,725.00624433 0.0000%
7 0x7879086b868da3d6ae27a6f0011bec351d8c3dd5 1,800,360.0720144 0.0000%
8 0xe56b4b0721584a03bbedf9e2f3f0e0f6a6be2202 1,800,360.0720144 0.0000%
9 0x878f6540f71d3b2729c4854ce779e13dc91c48c3 1,800,360.0720144 0.0000%
10 0x065ee8c70baebce22f1dafb1214273defaac133c 1,800,360.0720144 0.0000%
11 0xd8e3c0d4a70cb50b586733f416c3dd4906dd4499 1,800,360.0720144 0.0000%
12 0x3ee4dcc05e7f6eae071815ae95ed41df4d466cd4 1,800,360.0720144 0.0000%
13 0xada3871e7c0b4a4fa2b11d7a09ce3d829a8b110a 1,800,360.0720144 0.0000%
14 0x27504201a13001ce938149e3e8230da4a9ec0a54 1,800,270.0360045 0.0000%
15 0x978089f31aa3fa2e6a0865dc4e3b795414a5f7c9 1,799,187.20065861 0.0000%
16 0x4faf96b1278e8d7deeb6560a6a3cab657a6da33b 1,020,175.59821628 0.0000%
17 0xb5e983f828814f306040564d71e0d94dc4721b91 664,109.04450073 0.0000%
18 0x63abec77beb48423fb33a955c777e4e4276b95da 635,531.52955672 0.0000%
19 0xd8b41003a4e62715286c67a4719dab652fbe1e71 592,815.06211795 0.0000%
20 0xe498cfe840242a8f177dc5d09f27e3bc55e6cb12 475,778.39898666 0.0000%
21 0x40cd447c147e2cabb67682314cf2d9d4b522fa5f 456,067.83519979 0.0000%
22 0xec002d3385713f927ae847545c4791ce0bd2bf17 366,310.61997223 0.0000%
23 0xd0f9259bd689c8e7d6713ab9f796792f828f2e7e 318,070.02799289 0.0000%
24 0xf00848ae5a85da1e8b434abe5ea54666e89e64c0 273,028.06319406 0.0000%
25 0x81f7c7b2a4997c9eed08011b8e00dbdbf6ec22f3 239,843.71466396 0.0000%
26 0x5aa492a3bc1557c9ac39c3b99aeae31067ce771b 214,421.6622502 0.0000%
27 0x50ed39a7c924ac9a10e383e9ecf22b3eaee4e8a4 211,162.09086189 0.0000%
28 0x58b15e3514e6823ec80f4849c28254967027d287 180,580.62402781 0.0000%
29 0x77abe93a18deddcc927cafa2cc95ec78b4095a53 180,023.86819365 0.0000%
30 0x7f578c7c53eba69cb418215fb4801b4793efa071 179,891.14759961 0.0000%
31 0xef9fa3d99c0ca765e360efcd7728be875aca1b8d 170,311.18508374 0.0000%
32 0xfed8f16992a2cb954740cef9727e4f4b7ea15dec 154,927.71290792 0.0000%
33 0x6e38208987a9f4d6e45d00f466b8c75ce2e0d0df 153,978.7079671 0.0000%
34 0x7b075ac1bf5f29707341b7d60889fe21ec57904b 141,726.68977964 0.0000%
35 0x76461f752c7aca9f038d29d86ec3173d840a1528 117,847.26898315 0.0000%
36 0x961a9541bba7c239a568c27d86df780fc82b0480 116,828.01474016 0.0000%
37 0xa774fdbd97d9e9da95a510c7ee28b74a19775982 115,377.54671336 0.0000%
38 0x634cb24ffa5cf115b482d74db4b12d172170076f 108,001.29601555 0.0000%
39 0x8e1c2ded34a56fa2514ea094fd2f447bb8ddaa5c 103,940.14738693 0.0000%
40 0x797a05431cafcd9460e794859db9d9a957ca76c1 101,927.56484176 0.0000%
41 0x74e5c7a7c45e77a947869281198c2806cf1f0959 100,797.9591702 0.0000%
42 0x84c0160d55a05a28a034e1e6776f84c5995aba3a 96,580.45152327 0.0000%
43 0x85c12ad9c49e19b8e737e8b9fede2454b99e6423 86,423.77670694 0.0000%
44 0xfb0517c9000fda24ba3d0350d41104c6965b92ae 83,317.99957459 0.0000%
45 0x3f3e2f0d8fa19722cd2de622e8dbee8ee2eb3821 78,824.71977489 0.0000%
46 0x4e9668acf14048a82d91b4ac4c0a1a3f0e836cf6 74,783.27179304 0.0000%
47 0x5856c515dbb1b07c525bde9a3da5fe22d7d0df2f 73,175.56242346 0.0000%
48 0x56b2a275bf05536fd16f76fb2fb622553b130672 72,724.86583564 0.0000%
49 0xca51b6bd912e8defc7273878ebd25db50973aabe 71,040.67745694 0.0000%
50 0xc6134f75dc989ac58c37ebe6437c70d1832e517a 69,790.58597696 0.0000%
Upvotes: 2