fetch the record which has the max count of a column after grouping using another column

Question

I am currently learning pandas in python and have come across a problem. I have a csv file containing US census data csv variables info. For this question we can focus columns named STNAME (eg. albama) and a column named COUNTY.

The STNAME has repeating values for every county.That is it can have multiple 'albama' for every county in it.

The question is : Which state has the most counties in it? (hint: consider the sumlevel key carefully! You'll need this for future questions too...)

What i am trying : 1.a> census_df[census_df.groupby('STNAME')['COUNTY'].count() == max(census_df.groupby('STNAME')['COUNTY'].count())]

1.b> census_df[census_df.groupby('STNAME')['COUNTY'].count() == (census_df.groupby('STNAME')['COUNTY'].count().max())]

1.c> census_df.groupby('STNAME')[census_df.groupby('STNAME')['COUNTY'].count() == (census_df.groupby('STNAME')['COUNTY'].count().max())]

Here I am trying to fetch the record which has the max county count. The last one gives an error : 'Columns not found: False, True'

2>

x = (census_df.groupby('STNAME')['COUNTY'].count() == census_df.groupby('STNAME')['COUNTY'].count().max() )
x[x['COUNTY'] == True]

Here I am grouping by state name and counting the counties for each state and fetching the max value

I can fetch the value as a bool. as in there is only a single state with bool "True". but I don't know how to fetch only that single record.

How should i procceed ?

Kashish Jhaveri · Accepted Answer

I found it!

census_df.groupby('STNAME')['COUNTY'].count()[census_df.groupby('STNAME')['COUNTY'].count() == (census_df.groupby('STNAME')['COUNTY'].count().max())]

I thought that the only thing that was missing was ['COUNTY'].count() . I thought that it was unneccessary before as i was querying it within the [] brackets , but i guess I was wrong. My understanding is that the outermost part was of different format than that which was being fetched by querying.

fetch the record which has the max count of a column after grouping using another column

Answers (1)

Related Questions