Reputation: 27
I try to use python3 and BeautifulSoup.
import requests
import json
from bs4 import BeautifulSoup
url = "https://www.binance.com/pl"
#get the data
data = requests.get(url);
soup = BeautifulSoup(data.text,'lxml')
print(soup)
If I open the html code (in browser) I can see: html code in browser
But in my data (printing in console) i cant see btc price: what data i cant see in console
Could u give me some advice how to scrape this data?
Upvotes: 2
Views: 129
Reputation: 4554
Use .findAll()
to find all the rows, and then you can use it to find all the cells in a given row. You have to look at how the page is structured. It's not a standard row, but a bunch of div
s made to look like a table. So you have to look at the role
of each div to get to the data you want.
I'm assuming that you're going to want to look at specific rows, so my example uses the Para column to find those rows. Since the star is in it's own little cell, the Para column is the second cell, or index of 1. With that, it's just a question of which cells you want to export.
You could take out the filter if you want to get everything. You can also modify it to see if the value of a cell is above a certain price point.
# Import necessary libraries
import requests
from bs4 import BeautifulSoup
# Ignore the insecure warning
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
# Set options and which rows you want to look at
url = "https://www.binance.com/pl"
desired_rows = ['ADA/BTC', 'ADX/BTC']
# Get the page and convert it into beautiful soup
response = requests.get(url, verify=False)
soup = BeautifulSoup(response.text, 'html.parser')
# Find all table rows
rows = soup.findAll('div', {'role':'row'})
# Process all the rows in the table
for row in rows:
try:
# Get the cells for the given row
cells = row.findAll('div', {'role':'gridcell'})
# Convert them to just the values of the cell, ignoring attributes
cell_values = [c.text for c in cells]
# see if the row is one you want
if cell_values[1] in desired_rows:
# Output the data however you'd like
print(cell_values[1], cell_values[-1])
except IndexError: # there was a row without cells
pass
This resulted in the following output:
ADA/BTC 1,646.39204255
ADX/BTC 35.29384873
Upvotes: 1