Reputation: 81
I'm having trouble converting the list into a dataframe with the code below:
from bs4 import BeautifulSoup
import requests
page = requests.get("http://investors.morningstar.com/ownership/shareholders-overview.html?t=AAPL®ion=idn&culture=en-US&ownerCountry=USA")
soup = BeautifulSoup(page.content, 'lxml')
quote = soup.find('table', class_='r_table2 text2 print97').find_all('tr')
for row in quote:
cols=row.find_all('td')
cols=[x.text.strip() for x in cols]
print (cols)
Ouput :
['Name', '', 'Ownership TrendPrevious 8 Qtrs', 'Shares', 'Change', '% TotalShares Held', '% TotalAssets', '', 'Date']
['']
['Russell Inv Tax-Managed DI Large Cap SMA', '', 'Premium', '15,981,694,820', '15,981,694,820', '95.20', '6.7', '', '12/31/2020']
['']
['Vanguard Total Stock Market Index Fund', '', 'Premium', '432,495,433', '1,210,943', '2.58', '5.31', '', '01/31/2021']
How to turn it into a dataframe and enter the column name in the first index and enter all the data after that into the contents of the dataframe,
['Name', '', 'Ownership TrendPrevious 8 Qtrs', 'Shares', 'Change', '% TotalShares Held', '% TotalAssets', '', 'Date']
with the final result in dataframe.
Upvotes: 1
Views: 80
Reputation: 10970
It is simple. Just use DataFrame
constructor.
from bs4 import BeautifulSoup
import pandas as pd
import requests
page = requests.get("http://investors.morningstar.com/ownership/shareholders-overview.html?t=AAPL®ion=idn&culture=en-US&ownerCountry=USA")
soup = BeautifulSoup(page.content, 'lxml')
quote = soup.find('table', class_='r_table2 text2 print97').find_all('tr')
data = []
for row in quote:
data.append([x.text.strip() for x in row.find_all('td')])
df = pd.DataFrame(data[1:], columns=data[0])
Upvotes: 3