Reputation: 53
I am currently learning pandas in python and have come across a problem. I have a csv file containing US census data csv variables info. For this question we can focus columns named STNAME (eg. albama) and a column named COUNTY.
The STNAME has repeating values for every county.That is it can have multiple 'albama' for every county in it.
The question is : Which state has the most counties in it? (hint: consider the sumlevel key carefully! You'll need this for future questions too...)
What i am trying :
1.a> census_df[census_df.groupby('STNAME')['COUNTY'].count() == max(census_df.groupby('STNAME')['COUNTY'].count())]
1.b> census_df[census_df.groupby('STNAME')['COUNTY'].count() == (census_df.groupby('STNAME')['COUNTY'].count().max())]
1.c> census_df.groupby('STNAME')[census_df.groupby('STNAME')['COUNTY'].count() == (census_df.groupby('STNAME')['COUNTY'].count().max())]
Here I am trying to fetch the record which has the max county count. The last one gives an error : 'Columns not found: False, True'
2>
x = (census_df.groupby('STNAME')['COUNTY'].count() == census_df.groupby('STNAME')['COUNTY'].count().max() )
x[x['COUNTY'] == True]
Here I am grouping by state name and counting the counties for each state and fetching the max value
I can fetch the value as a bool. as in there is only a single state with bool "True". but I don't know how to fetch only that single record.
How should i procceed ?
Upvotes: 0
Views: 76
Reputation: 53
I found it!
census_df.groupby('STNAME')['COUNTY'].count()[census_df.groupby('STNAME')['COUNTY'].count() == (census_df.groupby('STNAME')['COUNTY'].count().max())]
I thought that the only thing that was missing was ['COUNTY'].count() . I thought that it was unneccessary before as i was querying it within the [] brackets , but i guess I was wrong. My understanding is that the outermost part was of different format than that which was being fetched by querying.
Upvotes: 1