Reputation: 313
I have a function in Python with Pandas that transforms some df read in from an excel file.
That Excel file has 5 sheets, I would like to process then all through the same function, and then join them at the end, with an extra column df['customer'] = sheet name
Lets say I want to run the sheets through the below function:
def function(df):
df = df.rename(columns=df.iloc[0]).drop(df.index[0]) #drop first row
df = df.rename(columns={np.nan: 'Model'}) #rename column to 'model'
indexNames = df[df['Model'].isin(['No','b', 'a','1988 Total'])].index
df = df.drop(indexNames)
df = df.dropna(axis=1,how='all')
return df
In the end, I would like to have 5 different df's(they have the same structure) that I could join at the end. However, from the 5 different sheets, I need the sheet_name to appear as a column, so I can tell them apart.
I tried joining them before hand with this code:
df = pd.concat(pd.read_excel("Final- Stock and Sales WG 2021 New format.xlsx", sheet_name=None), ignore_index=True)
However I get a value error when running on my function of ValueError: cannot reindex from a duplicate axis
at this line
indexNames = df[df['Model'].isin(['No','b', 'a','1988 Total'])].index
Any corrections or other approaches welcome. Thanks!
Upvotes: 1
Views: 1676
Reputation: 534
xls = pd.ExcelFile('datavallatestitemstop2.xlsx')
df_main = pd.DataFrame()
for sheet in xls.sheet_names:
df = pd.read_excel(xls, sheet)
df['Sheet-Name'] = sheet
df_main = df_main.append(df)
Upvotes: 2