diogoaos
diogoaos

Reputation: 400

Pandas groupby(...).mean() lost keys

I have dataframe rounds (which was the result of deleting a column from another dataframe) with the following structure (can't post pics, sorry):

----------------------------
|type|N|D|NATC|K|iters|time|
----------------------------
rows of data
----------------------------

I use groupby so I can then get the mean of the groups, like so:

rounds = results.groupby(['type','N','D','NATC','K','iters'])
results_mean = rounds.mean()

I get the means that I wanted but I get a problem with the keys. The results_mean dataframe has the following structure:

----------------------------
|    | | |    | |     |time|
|type|N|D|NATC|K|iters|    |
----------------------------
rows of data
----------------------------

The only key recognized is time (I executed results_mean.keys()).

What did I do wrong? How can I fix it?

Upvotes: 2

Views: 2540

Answers (2)

luis olenscki
luis olenscki

Reputation: 1

I've got the same problem of losing the dataframes's keys due to the use of the group_by() function and the answer I found for that problem was to convert the Dataframe into a CSV file then read this file.

Upvotes: 0

Carsten
Carsten

Reputation: 18446

In your aggregated data, time is the only column. The other ones are indices.

groupby has a parameter as_index. From the documentation:

as_index : boolean, default True

For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output

So you can get the desired output by calling

rounds = results.groupby(['type','N','D','NATC','K','iters'], as_index = False)
results_mean = rounds.mean()

Or, if you want, you can always convert indices to keys by using reset_index. Using

rounds = results.groupby(['type','N','D','NATC','K','iters'])
results_mean = rounds.mean().reset_index()

should have the desired effect as well.

Upvotes: 7

Related Questions