Reputation: 31
I have recently started to learn more about Python and how to Parse websites using BeautifulSoup.
The problem I now face is that I seem to be stuck.
HTML Code (after taken by soup):
<div class="mod-3-piece-app__visual-container__chart">
<div class="mod-ui-chart--dynamic" data-chart-config='{"chartData":{"periods":[{"year":2013,"period":null,"periodicity":"A","icon":null},{"year":2014,"period":null,"periodicity":"A","icon":null},{"year":2015,"period":null,"periodicity":"A","icon":null},{"year":2016,"period":null,"periodicity":"A","icon":null},{"year":2017,"period":null,"periodicity":"A","icon":null},{"year":2018,"period":null,"periodicity":"A","icon":null}],"forecastRange":{"from":3.5,"to":5.5},"actualValues":[5.6785,6.45,9.22,8.31,null,null],"consensusData":[{"y":5.6307,"toolTipData":{"low":5.5742,"high":5.7142,"analysts":34,"restatement":null}},{"y":6.3434,"toolTipData":{"low":6.25,"high":6.5714,"analysts":35,"restatement":null}},{"y":9.1265,"toolTipData":{"low":9.02,"high":9.28,"analysts":40,"restatement":null}},{"y":8.2734,"toolTipData":{"low":8.17,"high":8.335,"analysts":40,"restatement":null}},{"y":8.9304,"toolTipData":{"low":8.53,"high":9.63,"analysts":41,"restatement":null}},{"y":10.1252,"toolTipData":{"low":8.63,"high":11.61,"analysts":42,"restatement":null}}]}}'>
<noscript>
<div class="mod-ui-chart--static">
<div class="mod-ui-chart--sprited" style="width:410px; height:135px; background:url('/data/Charts/EquityForecast?issueID=36276&height=135&width=410') 0px -270px no-repeat;">
</div>
</div>
</noscript>
</div>
</div>
My code:
from bs4 import BeautifulSoup
import urllib.request
data = []
List = ['AAPL']
# Iterates Through List
for i in List :
# The webpage which we wish to Parse
soup = BeautifulSoup(urllib.request.urlopen('https://markets.ft.com/data/equities/tearsheet/forecasts?s=AAPL:NSQ').read(), 'lxml')
# Gathering the data
Values = soup.find_all("div", {"class":"mod-3-piece-app__visual-container__chart"})[4]
print(Values)
# Getting desired values from data
What I wish to attain is the values after {"y" ....,
hence the numbers 5.6307,6.3434,9.1265, 8.2734, 8.9304 and 10.1252
but I can't for the life of me figure out how. I have tried Values.get_text
as well as Values.text
but this just gives blank (probably because all of the code is inside a list or something similar).
If I could just get the data after "toolTipData" that would be fine as well.
Is there anyone that mind helping me out?
If I've missed anything please provide feedback so I can make a better question in the future.
Thank you
Upvotes: 1
Views: 664
Reputation: 5157
Shortly, you want to get some info that is located inside an attribute tag.
All I had to do was:
find_all
looking for the right class attribute mod-ui-chart--dynamic
find_all
, fetch it's attribute content using .get()
'actualValues'
'actualValues'
, then load json and navigate through it's values.Try the following piece of code. I've commented it, so you should be able to understand what it's doing.
Code:
from bs4 import BeautifulSoup
import urllib.request
import json
data = []
List = ['AAPL']
# Iterates Through List
for i in List:
# The webpage which we wish to Parse
soup = BeautifulSoup(urllib.request.urlopen('https://markets.ft.com/data/equities/tearsheet/forecasts?s=AAPL:NSQ').read(), 'lxml')
# Gathering the data
elemList = soup.find_all('div', {'class':'mod-ui-chart--dynamic'})
#we will get the attribute info of each `data-chart-config` tag, inside each `div` with `class=mod-ui-chart--dynamic`
for elem in elemList:
elemID = elem.get('class')
elemName = elem.get('data-chart-config')
#if there's no value in elemName, pass...
if elemName is None:
pass
#if the term 'actualValues' exists in elemName
elif 'actualValues' in elemName:
#print('Extracting actualValues from:\n')
#print("Attribute id = %s" % elemID)
#print()
#print("Attribute name = %s" % elemName)
#print()
#reading `data-chart-config` attribute as a json
data = json.loads(elemName)
#print(json.dumps(data, indent=4, sort_keys=True))
#print(data['chartData']['actualValues'])
#fetching desired info
val1 = data['chartData']['actualValues'][0]
val2 = data['chartData']['actualValues'][1]
val3 = data['chartData']['actualValues'][2]
val4 = data['chartData']['actualValues'][3]
#printing desired values
print(val1, val2, val3, val4)
print('-'*15)
Output:
1.9 1.42 1.67 3.36
---------------
5.6785 6.45 9.22 8.31
---------------
50557000000 42358000000 46852000000 78351000000
---------------
170910000000 182795000000 233715000000 215639000000
---------------
p.s.1: if you want, you can uncomment the print()
functions inside the elif loop
to understand the program.
p.s.2: if you want, you can change the 'actualValues'
at val1 = data['chartData']['actualValues'][0]
to 'consensusData'
Upvotes: 1