leofields
leofields

Reputation: 669

Boolean indexing a DataFrame in pandas an replacing the columns with elements of a Series where condition is True

I have a dataframe:

>>>df=pd.DataFrame(np.random.randn(3,3))
>>>df
             0         1         2
   0 -0.685692  0.180900  0.652838
   1  0.484584 -0.441004 -1.617281
   2 -0.665110  1.196987 -0.133439 

I want to replace all elements of a row that are >0 for example with the corresponding element of a Series s of length df.shape[0]:

>>>s = pd.Series((3,4,5))
>>>s
   0    3
   1    4
   2    5
   dtype: int64

It works with:

>>>df.where(df<=0, s, axis=0)
             0         1         2
   0 -0.685692  3.000000  3.000000
   1  4.000000 -0.441004 -1.617281
   2 -0.665110  5.000000 -0.133439

But my real criteria is quite complex so I want to state the criteria in a positive way (instead of reversing the criteria like in the wherestatement above with <=0 instead of >0 way with:

>>>df[df>0] = s

But instead of a result I get an Error that the axis is missing. How can I specify the axis in the statement above?

Error Message:

Traceback (most recent call last):
  Python Shell, prompt 97, line 1
  File "/Users/a/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 2297, in __setitem__
    self._setitem_frame(key, value)
  File "/Users/a/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 2335, in _setitem_frame
    self.where(-key, value, inplace=True)
  File "/Users/a/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 3940, in where
    fill_value=np.nan)
  File "/Users/a/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 2680, in align
    fill_axis=fill_axis, broadcast_axis=broadcast_axis)
  File "/Users/a/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 3784, in align
    fill_axis=fill_axis)
  File "/Users/a/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 3870, in _align_series
    raise ValueError('Must specify axis=0 or 1')
ValueError: Must specify axis=0 or 1

Upvotes: 4

Views: 3517

Answers (1)

leofields
leofields

Reputation: 669

The solution is obvious:

>>>df.where(~(df>0),s)

You simply have to negate the condition. Sometimes when sitting too long in front of the screen one tends to get blind !

Upvotes: 2

Related Questions