Light_B
Light_B

Reputation: 1800

The truth value of a Series is ambiguous while using 'if loop'

I've 2 data frames of unequal length df and df_IGR both of which have columns Lat and Lon which I want to compare.

When I do element-wise comparison such that I'm able to extract any column values from either of the two df's for which the difference in Lat and Lon between of the two df's is less than 0.005

for row in df_IGR.itertuples():
    print(df.index[((df.Lat - row.Lat) < 0.005) & ((df.Lon - row.Lon) < 0.005)])

This Works but when I'm trying to use the same condition in an if loop, it doesn't work.

for row in df_IGR.itertuples():
    count = 0
    if (((df.Lat - row.Lat) < 0.0005) & ((df.Lon - row.Lon) < 0.0005)):
        print (row.Name)
        count = count + 1
print (count)
>>ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Can anyone explain why it's not working? And how can I make the "if loop" working here? I already went through this answer but I don't understand what I need to change. https://stackoverflow.com/a/36922103/7763326

Edit: the data looks like

df.head()
>>
                 back_hr    Lat      Lon        AGL   Pressure
BT_time                 
1998-01-01 11:15:00  0.0    75.197  -150.045    1000.0  900.8
1998-01-01 11:00:00 -0.2    75.245  -150.001    1002.4  900.5
1998-01-01 10:00:00 -1.2    75.447  -149.863    1011.2  899.2
1998-01-01 09:00:00 -2.2    75.661  -149.785    1019.3  898.3
1998-01-01 08:00:00 -3.2    75.887  -149.762    1028.1  897.1

df_IGR.head()
>>
    ID          Lat     Lon   Elevation State   Name    FirstYr LastYr  NObs
193 BEM0000644  50.796  4.358   99       NaN    UCCLE   1949    2016    42564
195 BEM0000647  50.033  5.400   558      NaN    ST-HUBE 1964    2006    26357
241 BOM0002685  53.933  27.633  231      NaN    MINSK   1939    1998    43822
242 BOM0003300  52.116  23.683  144      NaN    BREST   1951    2007    40167
244 BOM0003304  52.401  30.963  126      NaN    GOMEL   1970    2017    32410

Upvotes: 0

Views: 514

Answers (2)

jpp
jpp

Reputation: 164623

Here's a readable one line answer. I've just added some data to make it more obvious what's happening. With the resulting Boolean arrays, you can apply any or all functions as you wish.

Lat1 = [3, 1, 4, 6, 1, 4, 9]  # df_IGR.Lat.values
Lat2 = [6, 7, 4, 1, 3]        # df.Lat.values

Lon1 = [5, 4, 5, 6, 8, 3, 9]  # df_IGR.Lon.values
Lon2 = [4, 2, 1, 6, 5]        # df.Lon.values

comparison = [[(i - j < 3) & (k - l < 3) for i, k in zip(Lat1, Lon1)] \
                                         for j, l in zip(Lat2, Lon2)]

# [[True, True, True, True, False, True, False],
#  [False, True, False, False, False, True, False],
#  [False, False, False, False, False, True, False],
#  [True, True, False, False, True, False, False],
#  [True, True, True, False, False, True, False]]

Upvotes: 0

Zhenya K.
Zhenya K.

Reputation: 345

df.Lat is a Series.

So is df.Lat - row.Lat.

So is (df.Lat - row.Lat) < 0.0005.

So is ((df.Lat - row.Lat) < 0.0005) & ((df.Lon - row.Lon) < 0.0005).

That last Series is a boolean Series that tells you which points in df are close enough to this one row in df_IGR. And this whole Series cannot just be True or False - this is what the error is saying.

I am not quite sure what exactly you are trying to achieve, so I can only explain the error.

If, however, you are trying to count the number of pairs of points that are close enough, you can do a nested loop:

count = 0
for row_IRG in df_IGR.itertuples():
    for row in df.itertuples():
        if (((row_IRG.Lat - row.Lat) < 0.0005) & ((row_IRG.Lon - row.Lon) < 0.0005)):
        print (row.Name, row_IRG.Name)
        count = count + 1
print (count)

Upvotes: 1

Related Questions