Python: Add a column that is the first row of a group

Question

I want to add to this dataframe a column which takes for values the first row of a group.

I have the following dataset:

df = pd.DataFrame(data={'APT_ORG_CODE':['AAL','AAL','AAL','AAL','ZYI','ZYI'],'APT_ORG':['Aalborg Airport','Aalborg Airport','Aalborg Airport','Aalborg Airport','Zunyi','Zunyi'],'APT_DES':['Amsterdam','Amsterdam','Copenhagen Kastrup Apt','Copenhagen Kastrup Apt','Zhuhai','Zhuhai'],'APT_DES_CODE':['AMS','AMS','CPH','CPH','ZUH','ZUH'],'Month':[2,8,2,8,2,8],'Nb_flights':[85,60,209,213,4,13]})

I want to add to the dataframe a column which has the number of flights for the second month of each airport pair and a column which has the number of flights for the eight month of each airport pair. I have tried this for the column of the second month:

df = df.assign(newcol = lambda x: df.groupby(['APT_ORG_CODE','APT_ORG','APT_DES','APT_DES_CODE'],as_index=False)['Nb_flights']).first())

However I get the following error.

incompatible index of inserted column with frame index

How would you do this?

jezrael · Accepted Answer

Use GroupBy.transform without lambda function:

df = df.assign(newcol = df.groupby(['APT_ORG_CODE','APT_ORG','APT_DES','APT_DES_CODE'])['Nb_flights']).transform('first'))

Lambda function is necessary, if chain some code processing DataFrame before assign, e.g. filtering:

df = (df.query('APT_ORG_CODE == "AAL"')
        .assign(newcol = lambda x: x.groupby(['APT_ORG_CODE','APT_ORG','APT_DES','APT_DES_CODE'])['Nb_flights']).transform('first')))

Python: Add a column that is the first row of a group

Answers (1)

Related Questions