Reputation: 391
just need a little bit of help in returning some values from a dataframe.
I've got a dataframe (call it df1) with some values:
ID X Y Distance Date
1 1 2 2.2 01/01/2000
2 2 3 1.8 02/02/2001
3 3 4 1.2 03/03/2002
4 4 5 2.7 04/04/2003
5 5 6 3.8 05/05/2004
Currently I have code which creates a new column - df1['Within 2k'] - which returns True if the Distance is within 2 km. For example, this would look like:
df1['Within 2k'] = df1['distance'] <= 2
print("df1")
ID X Y Distance Date Within 2k
1 1 2 2.2 01/01/2000 False
2 2 3 1.8 02/02/2001 True
3 3 4 1.2 03/03/2002 True
4 4 5 2.7 04/04/2003 False
5 5 6 3.8 05/05/2004 False
I also have code which changes ID & Distance to "Null" if they aren't within 2km. For instance, this looks like:
df1['ID'] = np.where((df1['Distance'] <= 2), df1['ID'], "Null")
df1['Distance'] = np.where((df1['Distance'] <= 2), df1['Distance'], "Null")
print(df1)
ID X Y Distance Date
Null 1 2 Null 01/01/2000
2 2 3 1.8 02/02/2001
3 3 4 1.2 03/03/2002
Null 4 5 Null 04/04/2003
Null 5 6 Null 05/05/2004
The aim of my code is to return the first record (chronologically) where the distance is within 2km. Currently I have code which returns the value where the Date value is minimum, however that includes Null values.
My code at the moment looks a bit like this:
Site2km = df1.loc[df1['Date'].idxmin(),'ID']
Dist2km = df1.loc[df1['Date'].idxmin(),'Distance']
return pd.Series([Site2km, Dist2km])
I need some code which will:
1) Return the first ID & Distance where the Distance is less than 2
2) If every value in the table is outside the Distance 2km, return the strings "Null" for both ID & Distance.
Upvotes: 1
Views: 2455
Reputation: 210822
actually you don't need additional columns:
In [35]: df
Out[35]:
ID X Y Distance Date
0 1 1 2 2.2 2000-01-01
1 2 2 3 1.8 2001-02-02
2 3 3 4 1.2 2002-03-03
3 4 4 5 2.7 2003-04-04
4 5 5 6 3.8 2004-05-05
In [36]: df.loc[df['Distance'] <= 2].nsmallest(1, 'Date')[['ID','Distance']]
Out[36]:
ID Distance
1 2 1.8
UPDATE:
In [47]: df
Out[47]:
ID X Y Distance Date
0 1 1 2 2.2 2000-01-01
1 2 2 3 1.8 2001-02-02
2 3 3 4 1.2 2002-03-03
3 4 4 5 2.7 2003-04-04
4 5 5 6 3.8 2004-05-05
In [48]: r = df.loc[df['Distance'] <= 2].nsmallest(1, 'Date')[['ID','Distance']]
In [49]: r
Out[49]:
ID Distance
1 2 1.8
let's simulate the situation when we don't have any points within 2km:
In [50]: df.Distance += 10
In [51]: r = df.loc[df['Distance'] <= 2].nsmallest(1, 'Date')[['ID','Distance']]
In [52]: r
Out[52]:
Empty DataFrame
Columns: [ID, Distance]
Index: []
In [53]: if r.empty:
...: r.loc[0] = [np.nan, np.nan]
...:
In [54]: r
Out[54]:
ID Distance
0 NaN NaN
Upvotes: 2