Anton Eitenbichler
Anton Eitenbichler

Reputation: 7

python lxml scrape price by xpath

Look I need the price of this cryptocurrency https://dex.guru/token/0x68848e1d1ffd7b38d103106c74220c1ad3494afc-bsc With this code:

import lxml
import requests
from lxml import html

html = requests.get('https://dex.guru/token/0x68848e1d1ffd7b38d103106c74220c1ad3494afc-bsc')
doc = lxml.html.fromstring(html.content)
new_releases = doc.xpath('//div[@class="0.00047061210058486165"]/text()')[0]
print(new_releases)

But I get this error IndexError: list index out of range I know it's raising the error because the list is empty, but why is the list empty? Please help, I am starting with scraping.

Upvotes: 0

Views: 364

Answers (1)

ce.teuf
ce.teuf

Reputation: 786

I find a solution (imperfect one for the moment) :

import cloudscraper

scraper = cloudscraper.create_scraper(delay=15, interpreter='nodejs')
url = "https://api.dex.guru/v2/tokens/price"
json = {"ids":
            ["0x68848e1d1ffd7b38d103106c74220c1ad3494afc-bsc",
            "0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c-bsc"]}
            
headers = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0"}

resp = scraper.post(url, headers=headers, json=json)

# when it works
print(resp.json())

You need to install 'cloudscraper' package with one js interpreter (here i used nodejs). This code sometimes failed to return data, sometimes return data. I will investigate to find out why such instability is observed.

when it works, it returns:

{'total': 2,
 'data': [{'address': '0x68848e1d1ffd7b38d103106c74220c1ad3494afc',
   'token_price_usd': 0.0003694899811954059,
   'token_price_eth': 5.0304271359669745e-06},
  {'address': '0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c',
   'token_price_usd': 481.9784105344807,
   'token_price_eth': 6.533152208208108}]}

It's possible to build a better code with setting a session and saving temporary cookies generated by cloudflare (read 'cloudflare' doc).

Note that when their official API is released, we will prefer to use it.

Maybe cloudflare ban u if you put that kind of code in a loop without sleep() control.

Upvotes: 1

Related Questions