qwerty
qwerty

Reputation: 887

Groupby and apply a defined function - Pandas

I have this df:

ID         Date   Time       Lat       Lon Time_1     Lat_1     Lon_1
 A  07/16/2019   08:00  29.39291 -98.50925  09:00  29.39923 -98.51256
 A  07/16/2019   09:00  29.39923 -98.51256  10:00  29.40147 -98.51123
 A  07/16/2019   10:00  29.40147 -98.51123  10:00  29.40147 -98.51123
 A  07/18/2019   08:30  29.38752 -98.52372  09:30  29.39291 -98.50925
 A  07/18/2019   09:30  29.39291 -98.50925  09:30  29.39291 -98.50925
 B  07/16/2019   08:00  29.39537 -98.50402  08:00  29.39537 -98.50402
 B  07/18/2019   11:00  29.39343 -98.49707  12:00  29.39291 -98.50925
 B  07/18/2019   12:00  29.39291 -98.50925  12:00  29.39291 -98.50925
 B  07/19/2019   10:00  29.39556 -98.53148  10:00  29.39556 -98.53148

I want to creat "Distance" column by grouping the df by ID and Date, and to apply a defined function.

The code I wrote:

def grp_crossarc(f):

    for i in range(len(f)):

        f.loc[i,'Distance'] = crossarc(f.iloc[i]['Lat'],f.iloc[i]['Lon'],
                                         f.iloc[i]['Lat_1'],f.iloc[i]['Lat_1'],
                                         29.39537,-98.50402)
    return f

df.groupby(['ID','Date'],as_index=False).apply(grp_crossarc)

crossarc is another defined function that gets 6 arguments (3 lat-lon points).

The result I got:

  ID         Date   Time       Lat       Lon Time_1     Lat_1     Lon_1  Distance
   A  07/16/2019   08:00  29.39291 -98.50925  09:00  29.39923 -98.51256  0.166057
   A  07/16/2019   09:00  29.39923 -98.51256  10:00  29.40147 -98.51123  0.889147
   A  07/16/2019   10:00  29.40147 -98.51123  10:00  29.40147 -98.51123  0.973550
   A  07/18/2019   08:30  29.38752 -98.52372  09:30  29.39291 -98.50925       NaN
   A  07/18/2019   09:30  29.39291 -98.50925  09:30  29.39291 -98.50925       NaN
 NaN          NaN    NaN       NaN       NaN    NaN       NaN       NaN  0.736501
 NaN          NaN    NaN       NaN       NaN    NaN       NaN       NaN  0.165974
   B  07/16/2019   08:00  29.39537 -98.50402  08:00  29.39537 -98.50402       NaN
 NaN          NaN    NaN       NaN       NaN    NaN       NaN       NaN  0.000000
   B  07/18/2019   11:00  29.39343 -98.49707  12:00  29.39291 -98.50925       NaN
   B  07/18/2019   12:00  29.39291 -98.50925  12:00  29.39291 -98.50925       NaN
 NaN          NaN    NaN       NaN       NaN    NaN       NaN       NaN  0.707027
 NaN          NaN    NaN       NaN       NaN    NaN       NaN       NaN  0.165974
   B  07/19/2019   10:00  29.39556 -98.53148  10:00  29.39556 -98.53148       NaN
 NaN          NaN    NaN       NaN       NaN    NaN       NaN       NaN  1.900238

For few (ID, Date) pairs, the Distance values shifted one row ahead, and therefore NaN values were created. How to fix it?

Upvotes: 2

Views: 106

Answers (1)

jezrael
jezrael

Reputation: 862581

You can try lambda function instead loop:

def grp_crossarc(f):
    f['Distance'] = (f.apply(lambda x: crossarc(x['Lat'],x['Lon'],
                                                x['Lat_1'],x['Lat_1'],
                                                29.39537,-98.50402), axis=1))
    return f

df = df.groupby(['ID','Date'],as_index=False).apply(grp_crossarc)

But it seems function is not dependent of groups, so should be simplify with omit groupby.apply:

df['Distance'] = (df.apply(lambda x: crossarc(x['Lat'],x['Lon'],
                                              x['Lat_1'],x['Lat_1'],
                                              29.39537,-98.50402), axis=1))

Upvotes: 1

Related Questions