Reputation: 43
I'm trying to extract all columns from multiple excel files and then map the filename to each extracted column however I'm struggling to work around a 'TypeError: Index does not support mutable operations'.
Below are my two files:
Fund_Data.xlsx:
FUND ID FUND NAME AMOUNT client code Price description Trade Date Trade Datetime
0 10101 Holdings company A 10000.5 1234 124.3 abcd 2020-08-19 2020-08-19 12:30:00
1 20202 Holdings company B -2000.5 192 -24.2 abcd 2020-08-20 2020-08-20 12:30:00
2 30303 Holdings company C 3000.5 123 192 NaN 2020-08-21 2020-08-21 12:30:00
3 10101 Holdings company A 10000 1234567 5.5 NaN 2020-08-22 2020-08-22 12:30:00
4 20202 Holdings company B 10000.5 9999 3.887 abcd 2020-08-23 2020-08-23 12:30:00
Stocks.xlsx
ID STOCK VALUE
1 3i 100
2 Admiral Group 200
3 Anglo American 300
4 Antofagasta 100
5 Ashtead 200
6 Associated British Foods 300
7 AstraZeneca 400
8 Auto Trader Group 500
9 Avast 600
And here is my code so far:
import pandas as pd
from os import walk
f = []
directory = 'C:/Users/rrai020/Documents/Python Scripts/DD'
for (dirpath, dirnames, filenames) in os.walk(directory):
for x in filenames:
if x.endswith('xlsx'):
f.append(x)
#f = ['Fund_Data.xlsx', 'Stocks.xlsx'] created a list from filenames in directory ^^^
data = pd.DataFrame() # initialize empty df
for filename in f:
df = pd.read_excel(filename, dtype = object, ignore_index=True).columns # read in each excel to df
df['filename'] = filename # add a column with the filename
data = data.append(df) # add all small df's to big df
print(data)
I'm trying to achieve the following output (or similar):
Field Name Filename
FUND ID Fund_Data.xlsx
FUND NAME Fund_Data.xlsx
AMOUNT Fund_Data.xlsx
client code Fund_Data.xlsx
Price Fund_Data.xlsx
description Fund_Data.xlsx
Trade Date Fund_Data.xlsx
Trade Datetime Fund_Data.xlsx
Trade time Fund_Data.xlsx
ID Stocks.xlsx
STOCK Stocks.xlsx
VALUE Stocks.xlsx
I would like the code to be flexible so that it can work for more than the 2 files I have here. Apologies if this is trivial, I'm still learning!
Upvotes: 1
Views: 655
Reputation: 9941
The problem is with the dataframe that you're appending. We need to create a dataframe with Field Name
, Filename
columns for each file inside the loop, and then append it to data
.
Here's an option:
data = pd.DataFrame()
for filename in f:
# read in each excel to df
df = pd.read_excel(filename, dtype = object, ignore_index=True).columns
# create a dataframe with (Field Name, Filename) columns for current file
x = pd.DataFrame({'Field Name': x.columns, 'Filename': filename})
# append to the global dataframe
data = data.append(x)
data
Output:
Field Name Filename
0 FUND ID Fund_Data.xlsx
1 FUND NAME Fund_Data.xlsx
2 AMOUNT Fund_Data.xlsx
3 client code Fund_Data.xlsx
4 Price description Fund_Data.xlsx
5 Trade Date Fund_Data.xlsx
6 Trade Datetime Fund_Data.xlsx
7 ID Stocks.xlsx
8 STOCK Stocks.xlsx
9 VALUE Stocks.xlsx
Upvotes: 1