user10620635
user10620635

Reputation: 27

python3 - how to scrape the data from span

I try to use python3 and BeautifulSoup.

import requests
import json
from bs4 import BeautifulSoup

url = "https://www.binance.com/pl"

#get the data
data = requests.get(url);

soup = BeautifulSoup(data.text,'lxml')

print(soup)

If I open the html code (in browser) I can see: html code in browser

But in my data (printing in console) i cant see btc price: what data i cant see in console

Could u give me some advice how to scrape this data?

Upvotes: 2

Views: 129

Answers (1)

Cohan
Cohan

Reputation: 4554

Use .findAll() to find all the rows, and then you can use it to find all the cells in a given row. You have to look at how the page is structured. It's not a standard row, but a bunch of divs made to look like a table. So you have to look at the role of each div to get to the data you want.

I'm assuming that you're going to want to look at specific rows, so my example uses the Para column to find those rows. Since the star is in it's own little cell, the Para column is the second cell, or index of 1. With that, it's just a question of which cells you want to export.

You could take out the filter if you want to get everything. You can also modify it to see if the value of a cell is above a certain price point.

# Import necessary libraries
import requests
from bs4 import BeautifulSoup
# Ignore the insecure warning
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

# Set options and which rows you want to look at
url = "https://www.binance.com/pl"
desired_rows = ['ADA/BTC', 'ADX/BTC']

# Get the page and convert it into beautiful soup
response = requests.get(url, verify=False)
soup = BeautifulSoup(response.text, 'html.parser')

# Find all table rows
rows = soup.findAll('div', {'role':'row'})

# Process all the rows in the table
for row in rows:
    try:
        # Get the cells for the given row
        cells = row.findAll('div', {'role':'gridcell'})
        # Convert them to just the values of the cell, ignoring attributes
        cell_values = [c.text for c in cells]

        # see if the row is one you want
        if cell_values[1] in desired_rows:
            # Output the data however you'd like
            print(cell_values[1], cell_values[-1])

    except IndexError: # there was a row without cells
        pass

This resulted in the following output:

ADA/BTC 1,646.39204255
ADX/BTC 35.29384873

Upvotes: 1

Related Questions