gwydion93
gwydion93

Reputation: 1923

Is KeyError in dataframe caused incorrect groupby application?

I have made a summarized dataframe from another dataframe using .groupby and .agg.

sum_df = cnms_df.groupby(['Tiermetric', 'Mod_unMod', 'Val_Combined', 'Det_Approx', 'State', 'Region', 'CO_FIPS']).agg({'MILES': 'sum'})

However, something doesn't look quite right; there seems to be missing values.

Tiermetric Mod_unMod Val_Combined Det_Approx State Region CO_FIPS MILES 
Other 1    UnMapped  ASSESSED     Approx     IN    5      18001   8.397255
                                                          18003   3.284817
                                                          18011   64.019156
                                                          18017   9.068318
TIER 4     Modernized VALID       Detailed   NC    4      37119   2.046716
                                             NC    4      37120   59.890107
                                             NC    4      37025   3.773599

When I try to do something like this:

sum_df['CO_FIPS'][0]

I get an error that seems related to indexing:

KeyError: 'CO_FIPS'

What I want is for my final dataframe to look like this:

Tiermetric Mod_unMod Val_Combined Det_Approx State Region CO_FIPS MILES 
Other 1    UnMapped  ASSESSED     Approx     IN    5      18001   8.397255
Other 1    UnMapped  ASSESSED     Approx     IN    5      18003   3.284817                    
Other 1    UnMapped  ASSESSED     Approx     IN    5      18011   64.019156
Other 1    UnMapped  ASSESSED     Approx     IN    5      18017   9.068318
TIER 4     Modernized VALID       Detailed   NC    4      37119   2.046716
TIER 4     Modernized VALID       Detailed   NC    4      37120   59.890107
TIER 4     Modernized VALID       Detailed   NC    4      37025   3.773599

How can I fix this?

Upvotes: 0

Views: 340

Answers (2)

Suhas C V
Suhas C V

Reputation: 522

Set as_index to False, by default this will be True:

sum_df = cnms_df.groupby(as_index=False,by=['Tiermetric', 'Mod_unMod', 'Val_Combined', 'Det_Approx', 'State', 'Region', 'CO_FIPS']).agg({'MILES': 'sum'})

Upvotes: 0

Swagga Ting
Swagga Ting

Reputation: 602

The groupby and sum cause all of these columns to become a multi-index. You can use reset_index() or pass in as_index=False to turn the index into columns.

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a':[1, 1, 2], 'b':[10, 20, 30]})

In [3]: df
Out[3]:
   a   b
0  1  10
1  1  20
2  2  30

In [4]: grouped = df.groupby('a').agg({'b': 'sum'})

In [5]: grouped  # a is an index now
Out[5]:
    b
a
1  30
2  30

In [6]: grouped = df.groupby('a', as_index=False).agg({'b': 'sum'})

In [7]: grouped  # now a is a column
Out[7]:
   a   b
0  1  30
1  2  30

This will work with multi-indexes as well.

Upvotes: 1

Related Questions