jp.B
jp.B

Reputation: 13

python list beautiful soup web scraping question

I am completely new in python and programming in general. At the moment I am playing a little bit with beautiful soup library and I tried to extract some fonds data from a website. At the end I got a list with all data I am interested in (top holdings, top countries and top sectors). For each of this categories I got a list (or better bs4.element.ResultSet) like this

[<div class="fw--chart fwwBreakdown" data-breakdown='{"series":[{"name":"APPLE INC","data":[3.43]},
{"name":"Microsoft Corp","data":[2.77]},{"name":"AMAZON COM INC","data":[2.18]},{"name":"ALPHABET INC
 CL C","data":[1.04]},{"name":"FACEBOOK CLASS A INC","data":[1.03]},{"name":"Alphabet Inc Class 
A","data":[0.99]},{"name":"Taiwan Semiconductor Manufacturing Co Ltd","data":[0.88]},{"name":"Tesla 
Motors Inc.","data":[0.83]},{"name":"Tencent Holdings Ltd.","data":[0.82]},{"name":"JPMORGAN CHASE  
CO","data":[0.76]}]}' id="fund-topholdings"> </div>,

My problem: The code above is onyl one element in my list. The next element looks similar but the data is for countires and then I have a further element for the sectors.

What is the best way to bring the asset names (Apple, Microsoft ... and the percentages 3.43, 2.77 ...) in a list or pandas-DataFrame to work with it?

The whole code so far is:

from bs4 import BeautifulSoup
import requests
import pandas as pd
asset_isin = "IE00BGHQ0G80"
url = f"https://www.fondsweb.com/de/{asset_isin}"
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
data = soup.find_all("div", attrs={"class":"fw--chart fwwBreakdown"})
top_holdings = data[0]
top_countires = data[1]
top_sectors = data[2]

So with data[0] I get the output above starting with [div class=... but all as element [0].

Thanks in advance

Upvotes: 1

Views: 68

Answers (1)

user5386938
user5386938

Reputation:

I am unsure as to what you need but see the following...

# coding: UTF-8
import pandas as pd
from bs4 import BeautifulSoup
import requests
import json

asset_isin = "IE00BGHQ0G80"
url = f"https://www.fondsweb.com/de/{asset_isin}"
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
charts = soup.select('div.fw--chart.fwwBreakdown')

data = {'name': [], 'data': []}
for d in charts:
    o = json.loads(d['data-breakdown'])
    for s in o['series']:
        data['name'].append(s['name'])
        data['data'].append(s['data'][0])

df = pd.DataFrame(data)

print(df)

Upvotes: 1

Related Questions