MITHU
MITHU

Reputation: 154

Failed to add content from multiple tables to a csv file using pandas.read_html()

I'm trying to scrape tabular content from this webpage and write the same to a csv file using pandas.read_html(). There are two tables in there with the same selector table.table--overflow[aria-label^='Financials'] and I wish to grab them all. My current implementation can print the content from both of the tables but write only the last table to a csv file.

import requests
import pandas as pd
from bs4 import BeautifulSoup

link = 'https://www.marketwatch.com/investing/stock/mbin/financials/balance-sheet'

def get_tabular_content(s,link):
    res = s.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    for selector in soup.select("table.table--overflow[aria-label^='Financials']"):
        df = pd.read_html(str(selector))[0]
        df.to_csv('marketwatch.csv', header=True, index=False)
        print(df)

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
    get_tabular_content(s,link)

How can I add content from multiple tables to a csv file using pandas.read_html()?

Upvotes: 0

Views: 92

Answers (1)

balderman
balderman

Reputation: 23815

stop overriding the output file - use a unique name. With this you will get multiple output files - each one of them represents HTML table from the page.

If you want to have 1 csv that will contain the data from all HTML tables go with adding the df to a list_of_df and after the loop is done call frame = pd.concat(list_of_df, axis=0, ignore_index=True)

list_of_df  = []
for selector in soup.select("table.table--overflow[aria-label^='Financials']"):
    df = pd.read_html(str(selector))[0]
    list_of_df.append(df)  


frame = pd.concat(list_of_df, axis=0, ignore_index=True)
frame.to_csv('marketwatch.csv', header=True, index=False)

The output ('marketwatch.csv') - 75 records

Item Item,2016,2017,2018,2019,2020,5-year trend
Total Cash & Due from Banks Total Cash & Due from Banks,10.04M,18.91M,25.86M,13.91M,10.06M,
Cash & Due from Banks Growth Cash & Due from Banks Growth,-,88.37%,36.76%,-46.20%,-27.65%,
Investments - Total Investments - Total,1.69B,1.92B,1.67B,3.19B,3.95B,
...
Return On Average Total Equity Return On Average Total Equity,-,-,-,-,24.66%,
Accumulated Minority Interest Accumulated Minority Interest,-,-,-,-,-,
Total Equity Total Equity,206.29M,367.47M,421.24M,653.73M,810.62M,
Liabilities & Shareholders' Equity Liabilities & Shareholders' Equity,2.72B,3.39B,3.88B,6.37B,9.65B,

Upvotes: 1

Related Questions