Reputation: 35
I want to add to this dataframe a column which takes for values the first row of a group.
I have the following dataset:
df = pd.DataFrame(data={'APT_ORG_CODE':['AAL','AAL','AAL','AAL','ZYI','ZYI'],'APT_ORG':['Aalborg Airport','Aalborg Airport','Aalborg Airport','Aalborg Airport','Zunyi','Zunyi'],'APT_DES':['Amsterdam','Amsterdam','Copenhagen Kastrup Apt','Copenhagen Kastrup Apt','Zhuhai','Zhuhai'],'APT_DES_CODE':['AMS','AMS','CPH','CPH','ZUH','ZUH'],'Month':[2,8,2,8,2,8],'Nb_flights':[85,60,209,213,4,13]})
I want to add to the dataframe a column which has the number of flights for the second month of each airport pair and a column which has the number of flights for the eight month of each airport pair. I have tried this for the column of the second month:
df = df.assign(newcol = lambda x: df.groupby(['APT_ORG_CODE','APT_ORG','APT_DES','APT_DES_CODE'],as_index=False)['Nb_flights']).first())
However I get the following error.
incompatible index of inserted column with frame index
How would you do this?
Upvotes: 1
Views: 82
Reputation: 862641
Use GroupBy.transform
without lambda function:
df = df.assign(newcol = df.groupby(['APT_ORG_CODE','APT_ORG','APT_DES','APT_DES_CODE'])['Nb_flights']).transform('first'))
Lambda function is necessary, if chain some code processing DataFrame before assign
, e.g. filtering:
df = (df.query('APT_ORG_CODE == "AAL"')
.assign(newcol = lambda x: x.groupby(['APT_ORG_CODE','APT_ORG','APT_DES','APT_DES_CODE'])['Nb_flights']).transform('first')))
Upvotes: 2