Reputation: 174
I have been struggling to do a web scraping with the below code and its showing me null records. If we print the output data, it dosent show the requested output. this is the web site i am going to do this web scraping https://coinmarketcap.com/. there are several pages which need to be taken in to the data frame. (64 Pages)
import requests
import pandas as pd
url = "https://api.coinmarketcap.com/data-api/v3/topsearch/rank"
req= requests.post(url)
main_data=req.json()
can anyone help me to sort this out?
Upvotes: 0
Views: 326
Reputation: 15568
Their terms of use prohibit web scraping. The site provides a well-documented API that has a free tier. Register and get API token:
from requests import Session
url = 'https://pro-api.coinmarketcap.com/v1/cryptocurrency/listings/latest'
parameters = {
'start':'1',
'limit':'5000',
'convert':'USD'
}
headers = {
'Accepts': 'application/json',
'X-CMC_PRO_API_KEY': HIDDEN_TOKEN, # replace that with your API Key
}
session = Session()
session.headers.update(headers)
response = session.get(url, params=parameters)
data = response.json()
print(data)
Upvotes: 0
Reputation: 3400
Instead of using
post
requests useget
in request call it will work!
import requests
res=requests.get("https://api.coinmarketcap.com/data-api/v3/topsearch/rank")
main_data=res.json()
data=main_data['data']['cryptoTopSearchRanks']
With All pages: You can find this URL from Network tab go to xhr and reload now go to second page URL will avail in xhr tab you can copy and make call of it i have shorten the URL here
res=requests.get("https://coinmarketcap.com/")
soup=BeautifulSoup(res.text,"html.parser")
last_page=soup.find_all("p",class_="sc-1eb5slv-0 hykWbK")[-1].get_text().split(" ")[-1]
res=requests.get(f"https://api.coinmarketcap.com/data-api/v3/cryptocurrency/listing?start=1&limit={last_page}&sortBy=market_cap&sortType=desc&convert=USD,BTC,ETH&cryptoType=all&tagType=all&audited=false&aux=ath")
Now use json method
data=res.json()['data']['cryptoCurrencyList']
print(len(data))
Output:
6304
Upvotes: 1
Reputation:
For getting/reading the data you need to use get method not post
import requests
import pandas as pd
import json
url = "https://api.coinmarketcap.com/data-api/v3/topsearch/rank"
req = requests.get(url)
main_data = req.json()
print(main_data) # without pretty printing
pretty_json = json.loads(req.text)
print(json.dumps(pretty_json, indent=4)) # with pretty print
Upvotes: 0