How to map filenames to the columns extracted from multiple excel files using pandas

Question

I'm trying to extract all columns from multiple excel files and then map the filename to each extracted column however I'm struggling to work around a 'TypeError: Index does not support mutable operations'.

Below are my two files:

Fund_Data.xlsx:
    
    FUND ID  FUND NAME           AMOUNT   client code  Price description   Trade Date    Trade Datetime  
0   10101   Holdings company A  10000.5   1234        124.3   abcd        2020-08-19    2020-08-19 12:30:00 
1   20202   Holdings company B  -2000.5   192         -24.2   abcd        2020-08-20    2020-08-20 12:30:00 
2   30303   Holdings company C  3000.5    123         192     NaN         2020-08-21    2020-08-21 12:30:00 
3   10101   Holdings company A  10000     1234567     5.5     NaN         2020-08-22    2020-08-22 12:30:00
4   20202   Holdings company B  10000.5   9999        3.887   abcd        2020-08-23    2020-08-23 12:30:00

Stocks.xlsx

ID   STOCK                     VALUE
1   3i                         100
2   Admiral Group              200
3   Anglo American             300
4   Antofagasta                100
5   Ashtead                    200
6   Associated British Foods   300
7   AstraZeneca                400
8   Auto Trader Group          500
9   Avast                      600

And here is my code so far:

import pandas as pd
from os import walk

f = []
directory = 'C:/Users/rrai020/Documents/Python Scripts/DD'
for (dirpath, dirnames, filenames) in os.walk(directory):
    for x in filenames:
        if x.endswith('xlsx'):
            f.append(x)
#f = ['Fund_Data.xlsx', 'Stocks.xlsx'] created a list from filenames in directory ^^^

data = pd.DataFrame() # initialize empty df
for filename in f:
    df = pd.read_excel(filename, dtype = object, ignore_index=True).columns # read in each excel to df
    df['filename'] = filename # add a column with the filename
    data = data.append(df) # add all small df's to big df 

print(data)

I'm trying to achieve the following output (or similar):

Field Name       Filename
FUND ID          Fund_Data.xlsx
FUND NAME        Fund_Data.xlsx
AMOUNT           Fund_Data.xlsx
client code      Fund_Data.xlsx
Price            Fund_Data.xlsx
description      Fund_Data.xlsx
Trade Date       Fund_Data.xlsx
Trade Datetime   Fund_Data.xlsx
Trade time       Fund_Data.xlsx
ID               Stocks.xlsx
STOCK            Stocks.xlsx
VALUE            Stocks.xlsx

I would like the code to be flexible so that it can work for more than the 2 files I have here. Apologies if this is trivial, I'm still learning!

How to map filenames to the columns extracted from multiple excel files using pandas

Answers (1)

Related Questions