user3203275
user3203275

Reputation: 305

Python pandas dataframe add element to previous row values and create a new column

enter image description here

Volume = [-1 -2 0 1 3 -1 -2 -1 -2 2 -2 -1 0]

Net_rate = [1 0 0 1 0 1 2 3 0 0 1 0]

I have the Pandas dataframe 'Volume' and need to create the column 'Net_rate'.

The algorithmic calculation is like that:

if Volume[i]*Volume[i-1]>0 then Net_rate[i-1]+1 else 0

I tried the following code but returns error:

def f_DataSlider(df, input, n, name):
    """Calculate the moving average for the given data.

    :param df: pandas.DataFrame
    :param n: 
    :return: pandas.DataFrame
    """
    Slider = pd.Series(df[input].shift(n), name=str(name))
    df = df.join(Slider)
    return df

data = f_DataSlider(data, 'Volume', 1, 'Volume_1')

mask2 = (data['Volume']*dfohlc['Volume_1'])>0
dfohlc.loc[mask2, 'Net_rate'] = dfohlc['Net_rate'].shift(-1)+1
dfohlc.loc[~mask2, 'Net_rate'] = 1

How should I code it? Could anyone help me, please?

Upvotes: 1

Views: 162

Answers (1)

user3483203
user3483203

Reputation: 51185

Using numba

You can make use of numba here, just make sure to pip install numba first. This option is slower than the cumsum option, but it helps to demonstrate how an algorithm for this problem would look.

Setup

from numba import njit

Create a boolean series using shift:

s = df.Volume.mul(df.Volume.shift(-1)).gt(0).tolist()

@njit
def increment(s):
    rate = [int(s[0]>0)]
    for idx, el in enumerate(s[1:]):
        if el:
            rate.append(rate[idx]+1)
        else:
            rate.append(0)
    return rate

increment(s)

[1, 0, 0, 1, 0, 1, 2, 3, 0, 0, 1, 0, 0]

Using cumsum and groupby:

s = df.Volume.mul(df.Volume.shift(-1)).gt(0).astype(int)

s.groupby(s.ne(s.shift()).cumsum()).cumsum()

0     1
1     0
2     0
3     1
4     0
5     1
6     2
7     3
8     0
9     0
10    1
11    0
12    0
Name: Volume, dtype: int32

Explanation

We can use shift for the Volume[i] * Volume[i-1] check:

df.Volume.mul(df.Volume.shift(-1))

0     2.0
1    -0.0
2     0.0
3     3.0
4    -3.0
5     2.0
6     2.0
7     2.0
8    -4.0
9    -4.0
10    2.0
11   -0.0
12    NaN
Name: Volume, dtype: float64

We can take this series, and check it is greater than 0:

df.Volume.mul(df.Volume.shift(-1)).gt(0).astype(int)

0     1
1     0
2     0
3     1
4     0
5     1
6     1
7     1
8     0
9     0
10    1
11    0
12    0
Name: Volume, dtype: int32

Now you can see the structure of the output is becoming clear, but we need to increment based on consecutive ranges of 1, which we can do using a groupby trick to groupby consecutive values, and take the cumsum.

Upvotes: 1

Related Questions