Seth Puhala
Seth Puhala

Reputation: 45

Why is Beautiful soup not finding the html element I am looking for?

I am trying to get cryptocurrency price changes from coinbase by parsing with beautiful soup. on the coinbase website (https://www.coinbase.com/price/ethereum) I can find the html element for the price change.

<h4 class="TextElement__Spacer-hxkcw5-0 caIgfs Header__StyledHeader-sc-1xiyexz-0 dLILyj">+0.33%</h4>

And then in python I use beautiful soup to find this element by looking through the h4 tag and it finds other h4 tags but not the one i am looking for

import requests
from bs4 import BeautifulSoup 

 result = requests.get("https://www.coinbase.com/price/ethereum")
 src = result.content
 soup = BeautifulSoup(src, "html.parser")
 tags = soup.find_all("h4")
 print (tags)

Upvotes: 2

Views: 215

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195573

The data is embedded in the page inside <script> tag. You can use json module to parse it.

For example:

import json
import requests
from bs4 import BeautifulSoup


url = 'https://www.coinbase.com/price/ethereum'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

data = json.loads(soup.select_one('script#server-app-state').contents[0])

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

print( data['initialData']['data']['prices']['prices']['latestPrice']['percentChange'] )

Prints:

{'hour': 0.0038781959207123133, 'day': -0.0025064363163135772, 'week': -0.02360650279511788, 'month': 0.13293312491891887, 'year': -0.10963199613423964}

EDIT:

The line data = json.loads(soup.select_one('script#server-app-state').contents[0]) will:

1.) select element <script id="server-app-state">...</script> from the soup

2.) the contents of this tag is Json string, so I decode it with json.loads()

3.) the result is stored to variable data (a python dictionary)

The line print( data['initialData']['data']['prices']['prices']['latestPrice']['percentChange'] ) will just print a content from this dictionary (you can see complete content of this dictionary by uncommenting the line print(json.dumps(data, indent=4))

Upvotes: 1

Related Questions