Reputation: 68366
I am writing a quick and dirty data sanitation script and I need to check that the data columns have the correct relative ranking.
The dataframe looks like this:
dt op hi lo cl vol adj prev
1986-01-02 1986-01-02 177.00 177.00 177.0000 177.00 75.8732 0.0 NaN
1986-01-03 1986-01-03 176.00 176.00 176.0000 176.00 75.4447 0.0 1986-01-02
1986-01-06 1986-01-06 172.00 172.00 172.0000 172.00 73.7299 0.0 1986-01-03
1986-01-07 1986-01-07 167.00 167.00 167.0000 167.00 71.5868 0.0 1986-01-06
1986-01-09 1986-01-09 168.00 168.00 168.0000 168.00 72.0153 0.0 1986-01-07
... ... ... ... ... ... ... ... ...
2020-09-14 2020-09-14 102.20 105.60 101.6500 104.70 104.7000 9720916.0 2020-09-11
2020-09-15 2020-09-15 106.45 110.70 106.4500 109.25 109.2500 15923105.0 2020-09-14
2020-09-16 2020-09-16 107.95 112.55 107.9500 112.10 112.1000 15399144.0 2020-09-15
2020-09-17 2020-09-17 110.40 112.85 110.0500 112.00 112.0000 6737225.0 2020-09-16
2020-09-18 2020-09-18 111.50 111.75 109.3923 110.75 110.7500 25308704.0 2020-09-17
I want to create a mask like this:
mask = df[(df.hi >= df.op) & (df.hi >= df.lo) & (df.hi >= df.cl) & (df.lo <= df.op) & (df.lo <= df.cl)]
However, when I try to select from df using df[mask
], I get the error message:
ValueError: Boolean array expected for the condition, not object
This is what I want to do:
How do I set the flag in a column in the dataframe based on my test condition?
Upvotes: 0
Views: 3442
Reputation: 859
The mask should be:
mask = (df.hi >= df.op) & (df.hi >= df.lo) & (df.hi >= df.cl) & (df.lo <= df.op) & (df.lo <= df.cl)
Insert that into the df with df[mask]
Upvotes: 2