Reputation: 27
I am aggregating a Pandas DF using numpy size and then want to load the results into an Excel using writer.save. But I am getting the following error: NotImplementedError: Writing as Excel with a MultiIndex is not yet implemented.
My data looks something like this:
agt_id unique_id
abc123 ab12345
abc123 cd23456
abc123 de34567
xyz987 ef45678
xyz987 fg56789
My results should look like:
agt_id unique_id
abc123 3
xyz987 2
This is an example of my code:
df_agtvol = df_agt.groupby('agt_id').agg({'unique_id':[np.size]})
writer = pd.ExcelWriter(outfilepath, engine='xlsxwriter')
df_agtvol.to_excel(writer, sheet_name='agt_vols')
I have tried to reset the index by using:
df_agt_vol_final = df_agtvol.set_index([df_agtvol.index, 'agt_id'], inplace=True)
based on some research, but am getting a completely different error.
I am relatively new to working with Pandas dataframes, so any help would be appreciated.
Upvotes: 1
Views: 89
Reputation: 19114
You don't need a MultiIndex. The reason you get one is because np.size
is wrapped in a list.
Although not explicitly documented, Pandas
interprets everything in the list as a subindex for 'unique_id'
. This use case falls under the "nested dict of names -> dicts of functions" case in the linked documentation.
So
df_agtvol = df_agt.groupby('agt_id').agg({'unique_id':[np.size]})
Should be
df_agtvol = df_agt.groupby('agt_id').agg({'unique_id': np.size})
This is still overly complicated and you can get the same results with a call to the count
method.
df_agtvol = df_agt.groupby('agt_id').count()
Upvotes: 1