np.where, multiple and or statement, two data frames

Question

I have two different data frames (nv1 and db1). I want to create a new column in nv1 named Novel_in_Database with value "Yes" or "No" based on multiple conditions.

The conditions are: The value of column MinBP of nv1 is greater than the value of column min_bp of db1 and smaller than max_bp of db1. Or The value of column MaxBP of nv1 is greater than the value of min_bp of db1 and smaller than max_bp of db1. Did I clarify the conditions? Here are my data frames.

db1
min_bp  max_bp
11  22
20  30
38  52

And

nv1
    MinBP   MaxBP
    10  20
    15  25
    36  50
    60  80
    85  96

The new nv1 would be:

MinBP   MaxBP   Novel_in_Database
10  20  No
15  25  No
36  50  Yes
60  80  Yes
85  96  Yes

So far, I have tried as follows,

nv1['Novel_in_Database']=np.where((((nv1.MinBP >= db1.min_bp) & (nv1.MinBP <= db1.max_bp)) |
                                   ((nv1.MaxBP <= db1.max_bp) & (nv1.MaxBP >= db1.min_bp))), 'No', 'Yes')

But it is giving me error: ValueError: Can only compare identically-labeled Series objects. These two data frames are of different shapes. Any help?

Code Different · Accepted Answer

You can use numpy's array broadcasting to all elements in an array against all elements in another array:

a = nv1['MinBP'].to_numpy()
b = nv1['MaxBP'].to_numpy()

# Raise the columns in db1 by a dimension to enable numpy's broadcasting
c = db1['min_bp'].to_numpy()[:, None]
d = db1['max_bp'].to_numpy()[:, None]

result = ((a > c) & (a < d)) | ((b > c) & (b < d))
nv1['Novel_in_Database'] = np.where(result.any(axis=0), 'No', 'Yes')

np.where, multiple and or statement, two data frames

Answers (2)

Related Questions