mclark1129
mclark1129

Reputation: 7592

How do I use a specific column's value in a Pandas DataFrame where clause?

I'm trying to select all cells in a pandas DataFrame that meet a certain criteria when a specific column also meets a separate criteria.

Given the following DataFrame:

      A    B    C    D
1/1   0    1    0    1
1/2   2    1    1    1
1/3   3    0    1    0 
1/4   1    0    1    2
1/5   1    0    1    1
1/6   2    0    2    1
1/7   3    5    2    3

I would like to somehow select the data where a column is greater than its previous value, when D is also > 1. The syntax I'm trying to use currently is:

matches = df[(df > df.shift(1)) & (df.D > 1)]

However, when i do this, I receive the following error:

TypeError: Could not operate [array([nan, nan, nan, nan], dtype=object)] with block values [operands could not be broadcast together with shapes (2016) (4) ]

Note: the error is a direct copy and past from my actual code, so the description and the shape in the error would not correlate directly to my example DataFrame.

I know that the df.D > 1 is causing the problem, and comparing columns directly to D is valid (df > df.D for example). What is wrong with my syntax when trying to compare D to the value 1, and how could I accomplish this?

Upvotes: 4

Views: 2407

Answers (2)

Jeff
Jeff

Reputation: 128948

This should work directly, but pandas doesn't have a broadcasting and operator (will happenin 0.14). Here's a workaround.

In [74]: df
Out[74]: 
     A  B  C  D
1/1  0  1  0  1
1/2  2  1  1  1
1/3  3  0  1  0
1/4  1  0  1  2
1/5  1  0  1  1
1/6  2  0  2  1
1/7  3  5  2  3

This is a where operation, essentially put np.nan where the condition is False

In [78]: x = df[df>df.shift(1)]

In [79]: x
Out[79]: 
      A   B   C   D
1/1 NaN NaN NaN NaN
1/2   2 NaN   1 NaN
1/3   3 NaN NaN NaN
1/4 NaN NaN NaN   2
1/5 NaN NaN NaN NaN
1/6   2 NaN   2 NaN
1/7   3   5 NaN   3

Select by the 2nd condition

In [80]: x[df.D>1]
Out[80]: 
      A   B   C  D
1/4 NaN NaN NaN  2
1/7   3   5 NaN  3

Upvotes: 4

cwharland
cwharland

Reputation: 6703

I think the problem is actually that the boolean array from the shift operation is one short of the the other conditional. Try adding a false to the first conditional at index zero you should then be able to combine the two conditionals.

I'd the problem really is with the second conditional could you post the result of

DF.dtypes

it looks like it's not int type given the nan array error

Upvotes: 0

Related Questions