christaylor
christaylor

Reputation: 391

Pandas Return Conditional Value

just need a little bit of help in returning some values from a dataframe.


I've got a dataframe (call it df1) with some values:

ID      X    Y    Distance   Date
1       1    2    2.2        01/01/2000
2       2    3    1.8        02/02/2001
3       3    4    1.2        03/03/2002
4       4    5    2.7        04/04/2003 
5       5    6    3.8        05/05/2004

Currently I have code which creates a new column - df1['Within 2k'] - which returns True if the Distance is within 2 km. For example, this would look like:

df1['Within 2k'] = df1['distance'] <= 2
print("df1")

ID      X    Y    Distance   Date         Within 2k
1       1    2    2.2        01/01/2000   False
2       2    3    1.8        02/02/2001   True
3       3    4    1.2        03/03/2002   True
4       4    5    2.7        04/04/2003   False
5       5    6    3.8        05/05/2004   False

I also have code which changes ID & Distance to "Null" if they aren't within 2km. For instance, this looks like:

df1['ID'] = np.where((df1['Distance'] <= 2), df1['ID'], "Null")
df1['Distance'] = np.where((df1['Distance'] <= 2), df1['Distance'], "Null")
print(df1)

ID     X    Y    Distance   Date
Null   1    2    Null       01/01/2000
2      2    3    1.8        02/02/2001
3      3    4    1.2        03/03/2002
Null   4    5    Null       04/04/2003 
Null   5    6    Null       05/05/2004

The aim of my code is to return the first record (chronologically) where the distance is within 2km. Currently I have code which returns the value where the Date value is minimum, however that includes Null values.

My code at the moment looks a bit like this:

Site2km = df1.loc[df1['Date'].idxmin(),'ID']
Dist2km = df1.loc[df1['Date'].idxmin(),'Distance']

return pd.Series([Site2km, Dist2km])

I need some code which will:

1) Return the first ID & Distance where the Distance is less than 2

2) If every value in the table is outside the Distance 2km, return the strings "Null" for both ID & Distance.

Upvotes: 1

Views: 2455

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210822

actually you don't need additional columns:

In [35]: df
Out[35]:
   ID  X  Y  Distance       Date
0   1  1  2       2.2 2000-01-01
1   2  2  3       1.8 2001-02-02
2   3  3  4       1.2 2002-03-03
3   4  4  5       2.7 2003-04-04
4   5  5  6       3.8 2004-05-05

In [36]: df.loc[df['Distance'] <= 2].nsmallest(1, 'Date')[['ID','Distance']]
Out[36]:
   ID  Distance
1   2       1.8

UPDATE:

In [47]: df
Out[47]:
   ID  X  Y  Distance       Date
0   1  1  2       2.2 2000-01-01
1   2  2  3       1.8 2001-02-02
2   3  3  4       1.2 2002-03-03
3   4  4  5       2.7 2003-04-04
4   5  5  6       3.8 2004-05-05

In [48]: r = df.loc[df['Distance'] <= 2].nsmallest(1, 'Date')[['ID','Distance']]

In [49]: r
Out[49]:
   ID  Distance
1   2       1.8

let's simulate the situation when we don't have any points within 2km:

In [50]: df.Distance += 10

In [51]: r = df.loc[df['Distance'] <= 2].nsmallest(1, 'Date')[['ID','Distance']]

In [52]: r
Out[52]:
Empty DataFrame
Columns: [ID, Distance]
Index: []

In [53]: if r.empty:
    ...:     r.loc[0] = [np.nan, np.nan]
    ...:

In [54]: r
Out[54]:
   ID  Distance
0 NaN       NaN

Upvotes: 2

Related Questions