ChrisMcK
ChrisMcK

Reputation: 27

Loading Pandas Data Frame into Excel using writer.save() and getting indexing error

I am aggregating a Pandas DF using numpy size and then want to load the results into an Excel using writer.save. But I am getting the following error: NotImplementedError: Writing as Excel with a MultiIndex is not yet implemented.

My data looks something like this:

agt_id     unique_id
abc123     ab12345
abc123     cd23456
abc123     de34567
xyz987     ef45678
xyz987     fg56789

My results should look like:

agt_id     unique_id
abc123     3
xyz987     2

This is an example of my code:

df_agtvol = df_agt.groupby('agt_id').agg({'unique_id':[np.size]})
writer = pd.ExcelWriter(outfilepath, engine='xlsxwriter') 
df_agtvol.to_excel(writer, sheet_name='agt_vols')

I have tried to reset the index by using:

df_agt_vol_final = df_agtvol.set_index([df_agtvol.index, 'agt_id'], inplace=True)

based on some research, but am getting a completely different error.

I am relatively new to working with Pandas dataframes, so any help would be appreciated.

Upvotes: 1

Views: 89

Answers (1)

Alex
Alex

Reputation: 19114

You don't need a MultiIndex. The reason you get one is because np.size is wrapped in a list.

Although not explicitly documented, Pandas interprets everything in the list as a subindex for 'unique_id'. This use case falls under the "nested dict of names -> dicts of functions" case in the linked documentation.

So

df_agtvol = df_agt.groupby('agt_id').agg({'unique_id':[np.size]})

Should be

df_agtvol = df_agt.groupby('agt_id').agg({'unique_id': np.size})

This is still overly complicated and you can get the same results with a call to the count method.

df_agtvol = df_agt.groupby('agt_id').count()

Upvotes: 1

Related Questions