Reputation: 7592
I'm trying to select all cells in a pandas DataFrame that meet a certain criteria when a specific column also meets a separate criteria.
Given the following DataFrame:
A B C D
1/1 0 1 0 1
1/2 2 1 1 1
1/3 3 0 1 0
1/4 1 0 1 2
1/5 1 0 1 1
1/6 2 0 2 1
1/7 3 5 2 3
I would like to somehow select the data where a column is greater than its previous value, when D
is also > 1. The syntax I'm trying to use currently is:
matches = df[(df > df.shift(1)) & (df.D > 1)]
However, when i do this, I receive the following error:
TypeError: Could not operate [array([nan, nan, nan, nan], dtype=object)] with block values [operands could not be broadcast together with shapes (2016) (4) ]
Note: the error is a direct copy and past from my actual code, so the description and the shape in the error would not correlate directly to my example DataFrame.
I know that the df.D > 1
is causing the problem, and comparing columns directly to D
is valid (df > df.D
for example). What is wrong with my syntax when trying to compare D
to the value 1
, and how could I accomplish this?
Upvotes: 4
Views: 2407
Reputation: 128948
This should work directly, but pandas doesn't have a broadcasting and operator (will happenin 0.14). Here's a workaround.
In [74]: df
Out[74]:
A B C D
1/1 0 1 0 1
1/2 2 1 1 1
1/3 3 0 1 0
1/4 1 0 1 2
1/5 1 0 1 1
1/6 2 0 2 1
1/7 3 5 2 3
This is a where operation, essentially put np.nan
where the condition is False
In [78]: x = df[df>df.shift(1)]
In [79]: x
Out[79]:
A B C D
1/1 NaN NaN NaN NaN
1/2 2 NaN 1 NaN
1/3 3 NaN NaN NaN
1/4 NaN NaN NaN 2
1/5 NaN NaN NaN NaN
1/6 2 NaN 2 NaN
1/7 3 5 NaN 3
Select by the 2nd condition
In [80]: x[df.D>1]
Out[80]:
A B C D
1/4 NaN NaN NaN 2
1/7 3 5 NaN 3
Upvotes: 4
Reputation: 6703
I think the problem is actually that the boolean array from the shift operation is one short of the the other conditional. Try adding a false to the first conditional at index zero you should then be able to combine the two conditionals.
I'd the problem really is with the second conditional could you post the result of
DF.dtypes
it looks like it's not int type given the nan array error
Upvotes: 0