Cedric Vongheer
Cedric Vongheer

Reputation: 31

Get specific values from BeautifulSoup Parsing

I have recently started to learn more about Python and how to Parse websites using BeautifulSoup.

The problem I now face is that I seem to be stuck.

HTML Code (after taken by soup):

<div class="mod-3-piece-app__visual-container__chart">
    <div class="mod-ui-chart--dynamic" data-chart-config='{"chartData":{"periods":[{"year":2013,"period":null,"periodicity":"A","icon":null},{"year":2014,"period":null,"periodicity":"A","icon":null},{"year":2015,"period":null,"periodicity":"A","icon":null},{"year":2016,"period":null,"periodicity":"A","icon":null},{"year":2017,"period":null,"periodicity":"A","icon":null},{"year":2018,"period":null,"periodicity":"A","icon":null}],"forecastRange":{"from":3.5,"to":5.5},"actualValues":[5.6785,6.45,9.22,8.31,null,null],"consensusData":[{"y":5.6307,"toolTipData":{"low":5.5742,"high":5.7142,"analysts":34,"restatement":null}},{"y":6.3434,"toolTipData":{"low":6.25,"high":6.5714,"analysts":35,"restatement":null}},{"y":9.1265,"toolTipData":{"low":9.02,"high":9.28,"analysts":40,"restatement":null}},{"y":8.2734,"toolTipData":{"low":8.17,"high":8.335,"analysts":40,"restatement":null}},{"y":8.9304,"toolTipData":{"low":8.53,"high":9.63,"analysts":41,"restatement":null}},{"y":10.1252,"toolTipData":{"low":8.63,"high":11.61,"analysts":42,"restatement":null}}]}}'>
        <noscript>
            <div class="mod-ui-chart--static">
                <div class="mod-ui-chart--sprited" style="width:410px; height:135px; background:url('/data/Charts/EquityForecast?issueID=36276&amp;height=135&amp;width=410') 0px -270px no-repeat;">
                </div>
            </div>
        </noscript>
    </div>
</div>

My code:

from bs4 import BeautifulSoup
import urllib.request


data = []
List = ['AAPL']

# Iterates Through List
for i in List :   
    # The webpage which we wish to Parse
    soup = BeautifulSoup(urllib.request.urlopen('https://markets.ft.com/data/equities/tearsheet/forecasts?s=AAPL:NSQ').read(), 'lxml')

    # Gathering the data
    Values = soup.find_all("div", {"class":"mod-3-piece-app__visual-container__chart"})[4]
    print(Values)

    # Getting desired values from data

What I wish to attain is the values after {"y" ...., hence the numbers 5.6307,6.3434,9.1265, 8.2734, 8.9304 and 10.1252 but I can't for the life of me figure out how. I have tried Values.get_text as well as Values.text but this just gives blank (probably because all of the code is inside a list or something similar).

If I could just get the data after "toolTipData" that would be fine as well.

Is there anyone that mind helping me out?

If I've missed anything please provide feedback so I can make a better question in the future.

Thank you

Upvotes: 1

Views: 664

Answers (1)

dot.Py
dot.Py

Reputation: 5157

Shortly, you want to get some info that is located inside an attribute tag.

All I had to do was:

  1. open the web page source to understand where's located your info
  2. use find_all looking for the right class attribute mod-ui-chart--dynamic
  3. for each element located using find_all, fetch it's attribute content using .get()
  4. search inside the attribute content string for the term 'actualValues'
  5. if found 'actualValues', then load json and navigate through it's values.

Try the following piece of code. I've commented it, so you should be able to understand what it's doing.

Code:

from bs4 import BeautifulSoup
import urllib.request
import json

data = []
List = ['AAPL']

# Iterates Through List
for i in List:   
    # The webpage which we wish to Parse
    soup = BeautifulSoup(urllib.request.urlopen('https://markets.ft.com/data/equities/tearsheet/forecasts?s=AAPL:NSQ').read(), 'lxml')

    # Gathering the data
    elemList = soup.find_all('div', {'class':'mod-ui-chart--dynamic'})

    #we will get the attribute info of each `data-chart-config` tag, inside each `div` with `class=mod-ui-chart--dynamic`
    for elem in elemList:

        elemID = elem.get('class')
        elemName = elem.get('data-chart-config')

        #if there's no value in elemName, pass...
        if elemName is None:
            pass

        #if the term 'actualValues' exists in elemName 
        elif 'actualValues' in elemName:
            #print('Extracting actualValues from:\n')
            #print("Attribute id = %s" % elemID)
            #print()
            #print("Attribute name = %s" % elemName)
            #print()

            #reading `data-chart-config` attribute as a json
            data = json.loads(elemName)

            #print(json.dumps(data, indent=4, sort_keys=True))
            #print(data['chartData']['actualValues'])

            #fetching desired info
            val1 = data['chartData']['actualValues'][0]
            val2 = data['chartData']['actualValues'][1]
            val3 = data['chartData']['actualValues'][2]
            val4 = data['chartData']['actualValues'][3]

            #printing desired values
            print(val1, val2, val3, val4)

            print('-'*15)

Output:

1.9 1.42 1.67 3.36
---------------
5.6785 6.45 9.22 8.31
---------------
50557000000 42358000000 46852000000 78351000000
---------------
170910000000 182795000000 233715000000 215639000000
---------------

p.s.1: if you want, you can uncomment the print() functions inside the elif loop to understand the program.

p.s.2: if you want, you can change the 'actualValues' at val1 = data['chartData']['actualValues'][0] to 'consensusData'

Upvotes: 1

Related Questions