Reputation: 391
I've got a function which I've set to return two values (call them Site & Date). I'm trying to use df.apply to create two new columns, each representing one of the returned values. I don't want to apply this function twice, or more, times because it will take ages, so I need some way to set the values of two columns to two, or more, values from the function. Here is my code.
df1[['Site','Site Date']] = df1.apply(
lambda row: firstSite(biomass, row['lat'], row['long'], row['Date']),
axis = 1)
The input value biomass is a dataframe of coordinates, row 'lat', 'lng', 'Date' are all columns from df1. If I decide to apply this function to df['Site'] it works perfectly but when I want to apply values to two columns I get this error.
ValueError: Shape of passed values is (999, 2), indices imply (999, 28)
def firstSite(biomass, lat, long, date):
biomass['Date of Operation'] = pd.to_datetime(biomass['Date of Operation'])
biomass = biomass[biomass['Date of Operation'] <= date]
biomass['distance'] = biomass.apply(
lambda row: distanceBetweenCm(lat, long, row['Lat'], row['Lng']),
axis=1)
biomass['Site Name'] = np.where((biomass['distance'] <= 2), biomass['Site Name'], "Null")
biomass = biomass.drop_duplicates('Site Name')
Site = biomass.loc[biomass['Date of Operation'].idxmin(),'Site Name']
Lat = biomass.loc[biomass['Date of Operation'].idxmin(),'Lat']
return Site, Lat
This function has a few tasks:
1 - It removes any rows from biomass where the date is after df1['Date'].
2 - If the distance between coordinates is more than 2, the 'Site Name' is changed to 'Null'
3 - It removes any duplicates from the site name, ensuring that there will only be one row with the value 'Null'.
4 - It returns the value of 'Site Name' & 'Lat' where the 'Date of Operation' is least.
I need my code to return the first (by date) record from biomass where the distance between the coordinates from df1 & biomass is less than 2km.
Hopefully I'll be able to return the first record for many different radius', such as first biomass site within 2km, 4km, 6km, 8km, 10km.
Upvotes: 3
Views: 562
Reputation: 863541
I think your function need return Series
with 2 values:
df1 = pd.DataFrame({'A':list('abcdef'),
'lat':[4,5,4,5,5,4],
'long':[7,8,9,4,2,3],
'Date':pd.date_range('2011-01-01', periods=6),
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (df1)
A Date E F lat long
0 a 2011-01-01 5 a 4 7
1 b 2011-01-02 3 a 5 8
2 c 2011-01-03 6 a 4 9
3 d 2011-01-04 9 b 5 4
4 e 2011-01-05 2 b 5 2
5 f 2011-01-06 4 b 4 3
biomass = 10
def firstSite(a,b,c,d):
return pd.Series([a + b, d])
df1[['Site','Site Date']] = df1.apply(lambda row: firstSite(biomass,
row['lat'], row['long'], row['Date']),
axis = 1)
print (df1)
A Date E F lat long Site Site Date
0 a 2011-01-01 5 a 4 7 14 2011-01-01
1 b 2011-01-02 3 a 5 8 15 2011-01-02
2 c 2011-01-03 6 a 4 9 14 2011-01-03
3 d 2011-01-04 9 b 5 4 15 2011-01-04
4 e 2011-01-05 2 b 5 2 15 2011-01-05
5 f 2011-01-06 4 b 4 3 14 2011-01-06
Upvotes: 5