CandleWax
CandleWax

Reputation: 2219

How to append multiple items to pandas df?

I have hundreds of excel files and I want to create a df that has the name of the excel file in one column and then the name of each tab in the second column. My script will iterate through each file name but the way I am appending is not right, and I'm not having much luck finding a solution.

os.chdir(r'C:\Users\mbobak\Documents\\')
FileList = glob.glob('*.xlsx')

tabs= pd.DataFrame(columns=['filename','tabs'])

for filename in FileList:
    xl = pd.ExcelFile(filename).sheet_names
    tabs= tabs.append([filename,xl])

desired output:

filename    tabs
doc1.xlsx   tab1
doc1.xlsx   tab2
doc1.xlsx   tab3
doc1.xlsx   tab4
doc2.xlsx   tab1
doc2.xlsx   tab2
doc2.xlsx   tab3

Upvotes: 2

Views: 3795

Answers (2)

jpp
jpp

Reputation: 164673

Here is one way. It is good practice, and efficient, to create your dataframe in a single step. This is because appending to a list is cheaper than appending to a dataframe.

FileList = glob.glob('*.xlsx')

def return_files_tabs(FileList):
    for filename in FileList:
        for sheet in pd.ExcelFile(filename).sheet_names:
            yield [filename, sheet]

df = pd.DataFrame(list(return_files_tabs(FileList)),
                  columns=['Filename', 'Tab'])

Explanation

  • Cycle through each file and each sheet within sheet_names attribute.
  • Yield filename and sheet via a generator.
  • Build dataframe from list of exhausted generator, name columns.

Upvotes: 5

Bill
Bill

Reputation: 11613

I like @jpp's answer but if you don't want to use a generator this is how I would do it:

FileList = glob.glob('*.xlsx')

filenames = []
sheets = []

for filename in FileList:
    for sheet in pd.ExcelFile(filename).sheet_names:
        filenames.append(filename)
        sheets.append(sheet)

df = pd.DataFrame(
    {'Filename': filenames, 'Tab': sheets}
)

Upvotes: 3

Related Questions