DISC-O
DISC-O

Reputation: 332

Why can't I apply shift from within a pandas function?

I am trying to build a function that uses .shift() but it is giving me an error. Consider this:

In [40]:

data={'level1':[20,19,20,21,25,29,30,31,30,29,31],
      'level2': [10,10,20,20,20,10,10,20,20,10,10]}
index= pd.date_range('12/1/2014', periods=11)
frame=DataFrame(data, index=index)
frame

Out[40]:
            level1 level2
2014-12-01  20  10
2014-12-02  19  10
2014-12-03  20  20
2014-12-04  21  20
2014-12-05  25  20
2014-12-06  29  10
2014-12-07  30  10
2014-12-08  31  20
2014-12-09  30  20
2014-12-10  29  10
2014-12-11  31  10

A normal function works fine. To demonstrate I calculate the same result twice, using the direct and function approach:

In [63]:
frame['horizontaladd1']=frame['level1']+frame['level2']#works

def horizontaladd(x):
    test=x['level1']+x['level2']
    return test
frame['horizontaladd2']=frame.apply(horizontaladd, axis=1)
frame
Out[63]:
            level1 level2 horizontaladd1 horizontaladd2
2014-12-01  20  10  30  30
2014-12-02  19  10  29  29
2014-12-03  20  20  40  40
2014-12-04  21  20  41  41
2014-12-05  25  20  45  45
2014-12-06  29  10  39  39
2014-12-07  30  10  40  40
2014-12-08  31  20  51  51
2014-12-09  30  20  50  50
2014-12-10  29  10  39  39
2014-12-11  31  10  41  41

But while directly applying shift works, in a function it doesn't work:

frame['verticaladd1']=frame['level1']+frame['level1'].shift(1)#works

def verticaladd(x):
    test=x['level1']+x['level1'].shift(1)
    return test
frame.apply(verticaladd)#error

results in

KeyError: ('level1', u'occurred at index level1')

I also tried applying to a single column which makes more sense in my mind, but no luck:

def verticaladd2(x):
    test=x-x.shift(1)
    return test
frame['level1'].map(verticaladd2)#error, also with apply

error:

AttributeError: 'numpy.int64' object has no attribute 'shift'

Why not call shift directly? I need to embed it into a function to calculate multiple columns at the same time, along axis 1. See related question Ambiguous truth value with boolean logic

Upvotes: 3

Views: 14526

Answers (3)

DRB
DRB

Reputation: 83

Check if the values you are trying to shift is not an array. Then you need to convert the array to series. With this you will be able to shift the values. I was having same issues,now I am able to get the shift values.

This is my part of the code for your reference.

X = grouped['Confirmed_day'].values
X_series=pd.Series(X)

X_lag1 = X_series.shift(1)

Upvotes: 2

JAB
JAB

Reputation: 12801

Try passing the frame to the function, rather than using apply (I am not sure why apply doesn't work, even column-wise):

def f(x):
    x.level1 
    return x.level1 + x.level1.shift(1)

f(frame)

returns:

2014-12-01   NaN
2014-12-02    39
2014-12-03    39
2014-12-04    41
2014-12-05    46
2014-12-06    54
2014-12-07    59
2014-12-08    61
2014-12-09    61
2014-12-10    59
2014-12-11    60
Freq: D, Name: level1, dtype: float64

Upvotes: 3

CodeMonkey
CodeMonkey

Reputation: 1835

I'm not entirely following along, but if frame['level1'].shift(1) works, then I can only imagine that frame['level1'] is not a numpy.int64 object while whatever you are passing into the verticaladd function is. Probably need to look at your types.

Upvotes: -1

Related Questions