zlee11
zlee11

Reputation: 69

Inverting logic for Bitwise operator (&, | ) on columns with objects

This is a continuation of the method used in this question.

Say we have a dataframe

    Make    Model   Year    HP    Cylinders Transmission  MPG-H MPG-C   Price
0   BMW 1 Series M  2011    335.0   6.0      MANUAL         26  19     46135
1   BMW 1 Series    2011    300.0   6.0      MANUAL         28  19     40650
2   BMW 1 Series    2011    300.0   6.0      MANUAL         28  20     36350
3   BMW 1 Series    2011    230.0   6.0      MANUAL         28  18     29450
4   BMW 1 Series    2011    230.0   6.0      MANUAL         28  18     34500
...

Using the interquartile range (IQR) (i.e middle 50%), I created 2 variables, upper and lower. The specific calculation isn't important in this discussion, but to give an example of upper:

Year          2029.50
HP             498.00
Cylinders        9.00
MPG-H           42.00
MPG-C           31.00
Price        75291.25

As expected, it only calculates values for columns that have int64 values.

When I want to filter out values that lie outside of the IQR,

correct_df = df[~((df < lower) |(df > upper)).any(axis=1)]

it gives me the right answer. However, when I invert the logic to use & instead of |, I get an empty dataframe. Here is the code:

another_df = df[((df >= lower) & (df <= upper)).all(axis=1)]

Which gives the results, but can be fixed by converting the index of upper/lower into a list ('lst'):

Make    Model   Year    HP  Cylinders   Transmission    Drive Mode  MPG-H   MPG-C   Price

----------------------------------------------------------------------------------------------

another_df = df[((df[lst] >= lower) & (df[lst] <= upper)).all(axis=1)]

It seems like & and | behave differently for non-numerical columns? Why does that happen?

Upvotes: 2

Views: 63

Answers (1)

user17242583
user17242583

Reputation:

& and | behave just as you'd expect; they're not the problem. They problem is that you're use all in the code that doesn't work, but in the code that does work, you're using any.

In the first example you say "select all rows where any column of the row is less than lower OR is greater than upper"

In the second example you say "select all rows where ALL columns of the row are greater than or equal to lower OR are less than or equal to upper".

Change all to any and you should be fine.

Upvotes: 1

Related Questions