Nairda123
Nairda123

Reputation: 313

Joining multiple sheets together with column as their sheet name

I have a function in Python with Pandas that transforms some df read in from an excel file.

That Excel file has 5 sheets, I would like to process then all through the same function, and then join them at the end, with an extra column df['customer'] = sheet name

Lets say I want to run the sheets through the below function:

def function(df):
    df = df.rename(columns=df.iloc[0]).drop(df.index[0]) #drop first row
    df = df.rename(columns={np.nan: 'Model'}) #rename column to 'model'

    indexNames = df[df['Model'].isin(['No','b', 'a','1988 Total'])].index
    df = df.drop(indexNames)
    df = df.dropna(axis=1,how='all')

    return df

In the end, I would like to have 5 different df's(they have the same structure) that I could join at the end. However, from the 5 different sheets, I need the sheet_name to appear as a column, so I can tell them apart.

I tried joining them before hand with this code:

df = pd.concat(pd.read_excel("Final- Stock and Sales WG 2021 New format.xlsx", sheet_name=None), ignore_index=True)

However I get a value error when running on my function of ValueError: cannot reindex from a duplicate axis at this line indexNames = df[df['Model'].isin(['No','b', 'a','1988 Total'])].index

Any corrections or other approaches welcome. Thanks!

Upvotes: 1

Views: 1676

Answers (1)

abdul
abdul

Reputation: 534

xls = pd.ExcelFile('datavallatestitemstop2.xlsx')

df_main = pd.DataFrame()
for sheet in xls.sheet_names:
    df = pd.read_excel(xls, sheet)
    df['Sheet-Name'] = sheet
    df_main = df_main.append(df)

Outputs -
enter image description here

Upvotes: 2

Related Questions