Reputation: 13
I'm trying to append each file's dataframe into a single master dataframe. The final dataframe is blank, however. I printed each before trying the append and the independent dataframes have data.
Code:
import pandas as pd
import os
source_directory = r'H:\folder'
masterDF = pd.DataFrame()
for file in os.listdir(source_directory):
if file.endswith(".xlsx") or file.endswith(".xls"):
dataframe = pd.read_excel(source_directory + '\\' + file)
print(dataframe)
masterDF.append(dataframe)
print(masterDF)
Result:
Col_A Col_B
0 46 5
1 56 4
2 45 4
3 45 4
4 455 5
5 4 4
6 4 5
7 544 4
Col_A Col_B
0 64 9
1 4 45
2 4 42
3 45 4
4 46 7
5 56 75
Empty DataFrame
Columns: []
Index: []
Upvotes: 0
Views: 5055
Reputation: 1511
Append doesn't work in place, it returns the appended DataFrame, so you have to assign it to masterDF:
masterDF = masterDF.append(dataframe)
However appending a dataframe means it has to build a new dataframe every time. A much faster alternative is to build a list of the dataframes that were read from the Excel files and then use pd.concat(my_list) which returns one dataframe.
Editing your code I would do it like this:
import pandas as pd
import os
source_directory = r'H:\folder'
master_list = []
for file in os.listdir(source_directory):
if file.endswith(".xlsx") or file.endswith(".xls"):
dataframe = pd.read_excel(source_directory + '\\' + file)
print(dataframe)
master_list.append(dataframe)
masterDF = pd.concat(master_list, ignore_index=True)
print(masterDF)
Upvotes: 2