How to append multiple items to pandas df?

Question

I have hundreds of excel files and I want to create a df that has the name of the excel file in one column and then the name of each tab in the second column. My script will iterate through each file name but the way I am appending is not right, and I'm not having much luck finding a solution.

os.chdir(r'C:\Users\mbobak\Documents\')
FileList = glob.glob('*.xlsx')

tabs= pd.DataFrame(columns=['filename','tabs'])

for filename in FileList:
    xl = pd.ExcelFile(filename).sheet_names
    tabs= tabs.append([filename,xl])

desired output:

filename    tabs
doc1.xlsx   tab1
doc1.xlsx   tab2
doc1.xlsx   tab3
doc1.xlsx   tab4
doc2.xlsx   tab1
doc2.xlsx   tab2
doc2.xlsx   tab3

jpp · Accepted Answer

Here is one way. It is good practice, and efficient, to create your dataframe in a single step. This is because appending to a list is cheaper than appending to a dataframe.

FileList = glob.glob('*.xlsx')

def return_files_tabs(FileList):
    for filename in FileList:
        for sheet in pd.ExcelFile(filename).sheet_names:
            yield [filename, sheet]

df = pd.DataFrame(list(return_files_tabs(FileList)),
                  columns=['Filename', 'Tab'])

Explanation

Cycle through each file and each sheet within sheet_names attribute.
Yield filename and sheet via a generator.
Build dataframe from list of exhausted generator, name columns.

How to append multiple items to pandas df?

Answers (2)

Related Questions