Rodger
Rodger

Reputation: 13

Error with dataframe append

I'm trying to append each file's dataframe into a single master dataframe. The final dataframe is blank, however. I printed each before trying the append and the independent dataframes have data.

Code:

import pandas as pd
import os

source_directory = r'H:\folder'

masterDF = pd.DataFrame()

for file in os.listdir(source_directory):
    if file.endswith(".xlsx") or file.endswith(".xls"):
        dataframe = pd.read_excel(source_directory + '\\' + file)
        print(dataframe)
        masterDF.append(dataframe)

print(masterDF)

Result:

   Col_A  Col_B
0     46      5
1     56      4
2     45      4
3     45      4
4    455      5
5      4      4
6      4      5
7    544      4
   Col_A  Col_B
0     64      9
1      4     45
2      4     42
3     45      4
4     46      7
5     56     75
Empty DataFrame
Columns: []
Index: []

Upvotes: 0

Views: 5055

Answers (1)

Jan van der Vegt
Jan van der Vegt

Reputation: 1511

Append doesn't work in place, it returns the appended DataFrame, so you have to assign it to masterDF:

masterDF = masterDF.append(dataframe)

However appending a dataframe means it has to build a new dataframe every time. A much faster alternative is to build a list of the dataframes that were read from the Excel files and then use pd.concat(my_list) which returns one dataframe.

Editing your code I would do it like this:

import pandas as pd
import os

source_directory = r'H:\folder'

master_list = []

for file in os.listdir(source_directory):
    if file.endswith(".xlsx") or file.endswith(".xls"):
        dataframe = pd.read_excel(source_directory + '\\' + file)
        print(dataframe)
        master_list.append(dataframe)

masterDF = pd.concat(master_list, ignore_index=True)
print(masterDF)

Upvotes: 2

Related Questions