Snyder Fox
Snyder Fox

Reputation: 174

Web Scraping with Requests -Python

I have been struggling to do a web scraping with the below code and its showing me null records. If we print the output data, it dosent show the requested output. this is the web site i am going to do this web scraping https://coinmarketcap.com/. there are several pages which need to be taken in to the data frame. (64 Pages)

import requests
import pandas as pd

url = "https://api.coinmarketcap.com/data-api/v3/topsearch/rank"

req= requests.post(url)
main_data=req.json()

can anyone help me to sort this out?

Upvotes: 0

Views: 326

Answers (3)

Prayson W. Daniel
Prayson W. Daniel

Reputation: 15568

Their terms of use prohibit web scraping. The site provides a well-documented API that has a free tier. Register and get API token:

from requests import Session

url = 'https://pro-api.coinmarketcap.com/v1/cryptocurrency/listings/latest'
parameters = {
  'start':'1',
  'limit':'5000',
  'convert':'USD'
}
headers = {
  'Accepts': 'application/json',
  'X-CMC_PRO_API_KEY': HIDDEN_TOKEN, # replace that with your API Key
}

session = Session()
session.headers.update(headers)

response = session.get(url, params=parameters)
data = response.json()
print(data)

Upvotes: 0

Bhavya Parikh
Bhavya Parikh

Reputation: 3400

Instead of using post requests use get in request call it will work!

import requests
res=requests.get("https://api.coinmarketcap.com/data-api/v3/topsearch/rank")
main_data=res.json()
data=main_data['data']['cryptoTopSearchRanks']

With All pages: You can find this URL from Network tab go to xhr and reload now go to second page URL will avail in xhr tab you can copy and make call of it i have shorten the URL here

res=requests.get("https://coinmarketcap.com/")
soup=BeautifulSoup(res.text,"html.parser")
last_page=soup.find_all("p",class_="sc-1eb5slv-0 hykWbK")[-1].get_text().split(" ")[-1]
res=requests.get(f"https://api.coinmarketcap.com/data-api/v3/cryptocurrency/listing?start=1&limit={last_page}&sortBy=market_cap&sortType=desc&convert=USD,BTC,ETH&cryptoType=all&tagType=all&audited=false&aux=ath")

Now use json method

data=res.json()['data']['cryptoCurrencyList']
print(len(data))

Output:

6304

Upvotes: 1

user11578023
user11578023

Reputation:

For getting/reading the data you need to use get method not post

import requests
import pandas as pd
import json

url = "https://api.coinmarketcap.com/data-api/v3/topsearch/rank"

req = requests.get(url)
main_data = req.json()

print(main_data)  # without pretty printing
pretty_json = json.loads(req.text)
print(json.dumps(pretty_json, indent=4))  # with pretty print

Upvotes: 0

Related Questions