Reputation: 41
Data Set in Question:
For the data set show in the above image, I am trying to find the three most populous states while only taking into consideration the three most populous counties for each state. I use CENSUS2010POP
.
This function should return a list of string values(in order of highest population to lowest population).
Below is My Code:
x=census_df.groupby('STNAME')['CENSUS2010POP'].nlargest(3)
This statement returns a series in which it groups the three most populous counties of each state.
Now, what do I do beyond this to further find the most populous states?
Do I use loops or is there another efficient method to approach this?
Upvotes: 3
Views: 9436
Reputation: 123
temp provides data w.r.t State and County (maintaining the SUMLEV criteria as well)
tempo provides top 3 counties for each state
temp = census_df[census_df.SUMLEV==50].groupby(['STNAME','CTYNAME']).agg({"CENSUS2010POP":sum})
tempo = pd.DataFrame(temp['CENSUS2010POP'].groupby(level=0, group_keys=False).nlargest(3))
tempo.CENSUS2010POP.groupby(level=0, group_keys=False).sum().nlargest(3).index.tolist()
Output
STNAME
California 37253956
Texas 25145561
New York 19378102
Name: CENSUS2010POP, dtype: int64
Upvotes: 0
Reputation: 21
Below is my take on this. I know generating a dataframe using iteraion is not a recommended approach, so if there are better suggestions please do share.
fulldf3 = pd.DataFrame()
for i in census_df['STNAME'].unique():
top3=census_df[(census_df['STNAME']==i) & (census_df['SUMLEV']==50)].groupby(['STNAME','CTYNAME'])['CENSUS2010POP'].sum().nlargest(3)
dftop3=pd.DataFrame(top3)
fulldf3=fulldf3.append(dftop3)
#gives the top3 states with population sum from top3 populous counties
fulldf3['CENSUS2010POP'].sum(level=0).nlargest(3)
#gives the names of the states in a list of strings
fulldf3['CENSUS2010POP'].sum(level=0).nlargest(3).index.to_list()
Output:
Upvotes: 2
Reputation: 1
def answer_six():
lst=[];
df=census_df.groupby(['STNAME'])
df=df.sum()
for i in range(3):
fd=df['CENSUS2010POP'].max()
df_temp=df[df['CENSUS2010POP']==fd]
lst.append(df_temp.index[0])
df=df.drop(df_temp.index[0],axis=0)
return lst
answer_six()
Upvotes: 0
Reputation: 53
def answer_six():
county = census_df[census_df['SUMLEV']==50]
states = county.groupby('STNAME')['CENSUS2010POP'].apply(lambda x: x.nlargest(3).sum()).nlargest(3).index.values.tolist()
return states
answer_six()
Upvotes: 2
Reputation: 1000
Try adding sum() to the function:
df = pd.DataFrame({'STNAME': ['Alabama', 'Alabama', 'Alabama', 'Alabama', 'Alaska', 'Alaska', 'Alaska', 'Alaska','New York', 'New York', 'New York', 'New York', 'California'],
'CTYNAME': ['a', 'b', 'c', 'd', 'z', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l'],
'CENSUS2010POP': [120000, 230000, 89000, 43000, 23000, 34000, 120000, 10000, 1200000, 1300000, 9800000, 560000, 45000]})
x=df.groupby('STNAME')['CENSUS2010POP'].sum().nlargest(3)
Output:
STNAME
New York 12860000
Alabama 482000
Alaska 187000
Name: CENSUS2010POP, dtype: int64
Upvotes: 5