Reputation: 13
I am completely new in python and programming in general. At the moment I am playing a little bit with beautiful soup library and I tried to extract some fonds data from a website. At the end I got a list with all data I am interested in (top holdings, top countries and top sectors). For each of this categories I got a list (or better bs4.element.ResultSet) like this
[<div class="fw--chart fwwBreakdown" data-breakdown='{"series":[{"name":"APPLE INC","data":[3.43]},
{"name":"Microsoft Corp","data":[2.77]},{"name":"AMAZON COM INC","data":[2.18]},{"name":"ALPHABET INC
CL C","data":[1.04]},{"name":"FACEBOOK CLASS A INC","data":[1.03]},{"name":"Alphabet Inc Class
A","data":[0.99]},{"name":"Taiwan Semiconductor Manufacturing Co Ltd","data":[0.88]},{"name":"Tesla
Motors Inc.","data":[0.83]},{"name":"Tencent Holdings Ltd.","data":[0.82]},{"name":"JPMORGAN CHASE
CO","data":[0.76]}]}' id="fund-topholdings"> </div>,
My problem: The code above is onyl one element in my list. The next element looks similar but the data is for countires and then I have a further element for the sectors.
What is the best way to bring the asset names (Apple, Microsoft ... and the percentages 3.43, 2.77 ...) in a list or pandas-DataFrame to work with it?
The whole code so far is:
from bs4 import BeautifulSoup
import requests
import pandas as pd
asset_isin = "IE00BGHQ0G80"
url = f"https://www.fondsweb.com/de/{asset_isin}"
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
data = soup.find_all("div", attrs={"class":"fw--chart fwwBreakdown"})
top_holdings = data[0]
top_countires = data[1]
top_sectors = data[2]
So with data[0] I get the output above starting with [div class=... but all as element [0].
Thanks in advance
Upvotes: 1
Views: 68
Reputation:
I am unsure as to what you need but see the following...
# coding: UTF-8
import pandas as pd
from bs4 import BeautifulSoup
import requests
import json
asset_isin = "IE00BGHQ0G80"
url = f"https://www.fondsweb.com/de/{asset_isin}"
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
charts = soup.select('div.fw--chart.fwwBreakdown')
data = {'name': [], 'data': []}
for d in charts:
o = json.loads(d['data-breakdown'])
for s in o['series']:
data['name'].append(s['name'])
data['data'].append(s['data'][0])
df = pd.DataFrame(data)
print(df)
Upvotes: 1