Mark K
Mark K

Reputation: 9358

Python, Pandas to combine Excel files into 1, with files names as Column A

I am combining Excel spreadsheets into 1 spreadsheet, and want to insert the filenames of the spreadsheets as first column when they are combined.

The Excel files to be combined are: 20170801.xlsx, 20170901.xlsx, 20171001.xlsx

This is the code I have:

working_folder = 'c:\\abc\\'
files = os.listdir(working_folder)

files_xls = [];
for f in files:
    if f.endswith(".xlsx"):
        fff = working_folder + f
        files_xls.append(fff)

df = pd.DataFrame();

for f in files_xls:
    data = pd.read_excel(f, skip_footer=0)
    file_date = basename(f).replace(".xlsx", "")
    #data.index = pd.MultiIndex.from_tuples([("", file_date)] * len(data))
    data.index = pd.MultiIndex.from_tuples([(file_date)] * len(data))
    df = df.append(data)

df.to_excel(working_folder + 'final.xls', merge_cells = False)

The problem lies in the line data.index = pd.MultiIndex….

When I use:

data.index = pd.MultiIndex.from_tuples([(file_date)] * len(data))

it produce a spreadhsheet like this:

enter image description here

When I use:

data.index = pd.MultiIndex.from_tuples([("", file_date)] * len(data))

It produce a better looking spreadsheet but an empty column A like this: enter image description here

Not to use MultiIndex, how can I have a final spreadsheet looked like this?

enter the image description here

Thank you.

Upvotes: 2

Views: 867

Answers (1)

jezrael
jezrael

Reputation: 863431

I believe you dont need MultiIndex, only change:

data.index = pd.MultiIndex.from_tuples([(file_date)] * len(data))

to:

data.index = [file_date] * len(data)

Upvotes: 5

Related Questions