Homunculus Reticulli
Homunculus Reticulli

Reputation: 68366

Pandas mask with boolean operation: ValueError: Boolean array expected for the condition, not object

I am writing a quick and dirty data sanitation script and I need to check that the data columns have the correct relative ranking.

The dataframe looks like this:

dt                        op     hi      lo        cl        vol           adj       prev                                                        
1986-01-02  1986-01-02  177.00  177.00  177.0000  177.00   75.8732         0.0         NaN
1986-01-03  1986-01-03  176.00  176.00  176.0000  176.00   75.4447         0.0  1986-01-02
1986-01-06  1986-01-06  172.00  172.00  172.0000  172.00   73.7299         0.0  1986-01-03
1986-01-07  1986-01-07  167.00  167.00  167.0000  167.00   71.5868         0.0  1986-01-06
1986-01-09  1986-01-09  168.00  168.00  168.0000  168.00   72.0153         0.0  1986-01-07
...                ...     ...     ...       ...     ...       ...         ...         ...
2020-09-14  2020-09-14  102.20  105.60  101.6500  104.70  104.7000   9720916.0  2020-09-11
2020-09-15  2020-09-15  106.45  110.70  106.4500  109.25  109.2500  15923105.0  2020-09-14
2020-09-16  2020-09-16  107.95  112.55  107.9500  112.10  112.1000  15399144.0  2020-09-15
2020-09-17  2020-09-17  110.40  112.85  110.0500  112.00  112.0000   6737225.0  2020-09-16
2020-09-18  2020-09-18  111.50  111.75  109.3923  110.75  110.7500  25308704.0  2020-09-17

I want to create a mask like this:

mask = df[(df.hi >= df.op) & (df.hi >= df.lo) & (df.hi >= df.cl) & (df.lo <= df.op) & (df.lo <= df.cl)]

However, when I try to select from df using df[mask], I get the error message:

ValueError: Boolean array expected for the condition, not object

This is what I want to do:

  1. Set boolean flag which is the result of the test above
  2. Convert boolean to int (0,1)
  3. Sum the column of the ints to see if it is a non zero number

How do I set the flag in a column in the dataframe based on my test condition?

Upvotes: 0

Views: 3442

Answers (1)

LazyEval
LazyEval

Reputation: 859

The mask should be:

mask = (df.hi >= df.op) & (df.hi >= df.lo) & (df.hi >= df.cl) & (df.lo <= df.op) & (df.lo <= df.cl)

Insert that into the df with df[mask]

Upvotes: 2

Related Questions