Python Pandas dataframe shift does not work in apply functions

Question

I am getting this error below

AttributeError: ("'float' object has no attribute 'shift'", 'occurred at index 718170')

on running my pandas scripts below.

def volumediff(x):
    if x['positive_mvt'] == True:
        volume_d = x['volume'].shift(1)
    else:
        volume_d = ""
    return volume_d

df['new_volume'] = df.apply(volumediff,axis=1)

So because of this I believe based on a almost similar error at AttributeError: 'float' object has no attribute 'split', I thought the issue is caused by a null value since the shift function takes the value that might be out of my dataset. However, I was successful in doing the below without having any issue.

df['new_volume'] = df['volume'].shift(1)

Unfortunately it just doesn't work with an apply function, which I need because I need to use "if else".

I have tried to get around by using the script below - by using an try except to skip any cells which create a value issue. But I am receiving "NA" and "" for all the values in my column, which shouldn't be the case.

def volumediff(x):
    if x['positive_mvt'] == True:
        try:
            volume_d = x['volume'].shift(1)
        except:
            volume_d = "NA"
    else:
        volume_d = ""
    return volume_d

df['new_volume'] = df.apply(volumediff,axis=1)

Original sample df:

x = [
    [False, 240.20353],
    [False, 621.28854],
    [True, 64.85972],
    [True, 151.86484],
    [False, 190.91042],
    [True, 128.78566],
    [False, 415.53138],
    [True, 43.14669],
    [True, 512.03531],
    [True, 502.41939],
]

df = pd.DataFrame(x, columns=['positive_mvt', 'volume'])

df
Out[1]: 
   positive_mvt     volume
0         False  240.20353
1         False  621.28854
2          True   64.85972
3          True  151.86484
4         False  190.91042
5          True  128.78566
6         False  415.53138
7          True   43.14669
8          True  512.03531
9          True  502.41939

Error example:

I checked my dataframe and I am suspecting that the issue might be caused by the conflict between my if function which only selects rows that are true, however some of the rows which are false are required by x[volume].shift(1) which is the row above it. But that wasn't the case because when I tried the script below, it wasn't working either and triggers the same attribute error. Looks like using the apply function just doesn't work with .shift.

def volumediff(x):
    volume_d = x['volume'].shift(1)
    return volume_d

df['new_volume'] = df.apply(volumediff,axis=1)

Anyone has any insights into how to solve this issue without doing two separately columns and sequentially work on on if else and the minus shift formula separately?

CJR · Accepted Answer

When you run apply it passes each column/row (in your case row) to the function that you're applying. If you call .shift() on a series it makes sense - you're shifting the series. Calling shift on a single value in your series, as you are doing, makes no sense (how do you shift 12? what does that even mean?).

What you want to be doing is this:

df['new_volume'] = df['volume'].shift(1)
df.loc[df['positive_mvt'] == False, 'new_volume'] = ""

Also I have no idea what your dtypes are and you should probably be careful with that.

Python Pandas dataframe shift does not work in apply functions

Answers (2)

Related Questions