christaylor
christaylor

Reputation: 391

Applying Pandas Function to more than one column

I've got a function which I've set to return two values (call them Site & Date). I'm trying to use df.apply to create two new columns, each representing one of the returned values. I don't want to apply this function twice, or more, times because it will take ages, so I need some way to set the values of two columns to two, or more, values from the function. Here is my code.

df1[['Site','Site Date']] = df1.apply(
    lambda row: firstSite(biomass, row['lat'], row['long'], row['Date']), 
    axis = 1)

The input value biomass is a dataframe of coordinates, row 'lat', 'lng', 'Date' are all columns from df1. If I decide to apply this function to df['Site'] it works perfectly but when I want to apply values to two columns I get this error.

ValueError: Shape of passed values is (999, 2), indices imply (999, 28)

def firstSite(biomass, lat, long, date):

    biomass['Date of Operation']  = pd.to_datetime(biomass['Date of Operation'])
    biomass = biomass[biomass['Date of Operation'] <= date]

    biomass['distance'] = biomass.apply(
        lambda row: distanceBetweenCm(lat, long, row['Lat'], row['Lng']), 
        axis=1)
    biomass['Site Name'] = np.where((biomass['distance'] <= 2), biomass['Site Name'], "Null")
    biomass = biomass.drop_duplicates('Site Name')
    Site = biomass.loc[biomass['Date of Operation'].idxmin(),'Site Name']
    Lat = biomass.loc[biomass['Date of Operation'].idxmin(),'Lat']
    return Site, Lat

This function has a few tasks:

1 - It removes any rows from biomass where the date is after df1['Date'].

2 - If the distance between coordinates is more than 2, the 'Site Name' is changed to 'Null'

3 - It removes any duplicates from the site name, ensuring that there will only be one row with the value 'Null'.

4 - It returns the value of 'Site Name' & 'Lat' where the 'Date of Operation' is least.

I need my code to return the first (by date) record from biomass where the distance between the coordinates from df1 & biomass is less than 2km.

Hopefully I'll be able to return the first record for many different radius', such as first biomass site within 2km, 4km, 6km, 8km, 10km.

Upvotes: 3

Views: 562

Answers (1)

jezrael
jezrael

Reputation: 863711

I think your function need return Series with 2 values:

df1 = pd.DataFrame({'A':list('abcdef'),
                   'lat':[4,5,4,5,5,4],
                   'long':[7,8,9,4,2,3],
                   'Date':pd.date_range('2011-01-01', periods=6),
                   'E':[5,3,6,9,2,4],
                   'F':list('aaabbb')})

print (df1)
   A       Date  E  F  lat  long
0  a 2011-01-01  5  a    4     7
1  b 2011-01-02  3  a    5     8
2  c 2011-01-03  6  a    4     9
3  d 2011-01-04  9  b    5     4
4  e 2011-01-05  2  b    5     2
5  f 2011-01-06  4  b    4     3

biomass = 10
def firstSite(a,b,c,d):
    return pd.Series([a + b, d])

df1[['Site','Site Date']] = df1.apply(lambda row: firstSite(biomass,
                                                  row['lat'], row['long'], row['Date']), 
                                                  axis = 1)
print (df1)
   A       Date  E  F  lat  long  Site  Site Date
0  a 2011-01-01  5  a    4     7    14 2011-01-01
1  b 2011-01-02  3  a    5     8    15 2011-01-02
2  c 2011-01-03  6  a    4     9    14 2011-01-03
3  d 2011-01-04  9  b    5     4    15 2011-01-04
4  e 2011-01-05  2  b    5     2    15 2011-01-05
5  f 2011-01-06  4  b    4     3    14 2011-01-06

Upvotes: 5

Related Questions