Reputation: 1800
I've 2 data frames of unequal length df and df_IGR both of which have columns Lat and Lon which I want to compare.
When I do element-wise comparison such that I'm able to extract any column values from either of the two df's for which the difference in Lat and Lon between of the two df's is less than 0.005
for row in df_IGR.itertuples():
print(df.index[((df.Lat - row.Lat) < 0.005) & ((df.Lon - row.Lon) < 0.005)])
This Works but when I'm trying to use the same condition in an if loop, it doesn't work.
for row in df_IGR.itertuples():
count = 0
if (((df.Lat - row.Lat) < 0.0005) & ((df.Lon - row.Lon) < 0.0005)):
print (row.Name)
count = count + 1
print (count)
>>ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Can anyone explain why it's not working? And how can I make the "if loop" working here? I already went through this answer but I don't understand what I need to change. https://stackoverflow.com/a/36922103/7763326
Edit: the data looks like
df.head()
>>
back_hr Lat Lon AGL Pressure
BT_time
1998-01-01 11:15:00 0.0 75.197 -150.045 1000.0 900.8
1998-01-01 11:00:00 -0.2 75.245 -150.001 1002.4 900.5
1998-01-01 10:00:00 -1.2 75.447 -149.863 1011.2 899.2
1998-01-01 09:00:00 -2.2 75.661 -149.785 1019.3 898.3
1998-01-01 08:00:00 -3.2 75.887 -149.762 1028.1 897.1
df_IGR.head()
>>
ID Lat Lon Elevation State Name FirstYr LastYr NObs
193 BEM0000644 50.796 4.358 99 NaN UCCLE 1949 2016 42564
195 BEM0000647 50.033 5.400 558 NaN ST-HUBE 1964 2006 26357
241 BOM0002685 53.933 27.633 231 NaN MINSK 1939 1998 43822
242 BOM0003300 52.116 23.683 144 NaN BREST 1951 2007 40167
244 BOM0003304 52.401 30.963 126 NaN GOMEL 1970 2017 32410
Upvotes: 0
Views: 514
Reputation: 164623
Here's a readable one line answer. I've just added some data to make it more obvious what's happening. With the resulting Boolean arrays, you can apply any
or all
functions as you wish.
Lat1 = [3, 1, 4, 6, 1, 4, 9] # df_IGR.Lat.values
Lat2 = [6, 7, 4, 1, 3] # df.Lat.values
Lon1 = [5, 4, 5, 6, 8, 3, 9] # df_IGR.Lon.values
Lon2 = [4, 2, 1, 6, 5] # df.Lon.values
comparison = [[(i - j < 3) & (k - l < 3) for i, k in zip(Lat1, Lon1)] \
for j, l in zip(Lat2, Lon2)]
# [[True, True, True, True, False, True, False],
# [False, True, False, False, False, True, False],
# [False, False, False, False, False, True, False],
# [True, True, False, False, True, False, False],
# [True, True, True, False, False, True, False]]
Upvotes: 0
Reputation: 345
df.Lat
is a Series.
So is df.Lat - row.Lat
.
So is (df.Lat - row.Lat) < 0.0005
.
So is ((df.Lat - row.Lat) < 0.0005) & ((df.Lon - row.Lon) < 0.0005)
.
That last Series is a boolean Series that tells you which points in df
are close enough to this one row in df_IGR. And this whole Series cannot just be True or False - this is what the error is saying.
I am not quite sure what exactly you are trying to achieve, so I can only explain the error.
If, however, you are trying to count the number of pairs of points that are close enough, you can do a nested loop:
count = 0
for row_IRG in df_IGR.itertuples():
for row in df.itertuples():
if (((row_IRG.Lat - row.Lat) < 0.0005) & ((row_IRG.Lon - row.Lon) < 0.0005)):
print (row.Name, row_IRG.Name)
count = count + 1
print (count)
Upvotes: 1