user3203275
user3203275

Reputation: 305

Python Pandas Cumprod issue

I am still struggling with the following code:

xa= [0, 0, 0, 0, 65, 67, 69, 75, 0, 0, 0]
xb = [.3, .3, .3,.3, .3, .3, .3, .3, .3, .3, .3]
ideal = [0, 0, 0, 0, 65, 67, 69, 75, 67.5, 60.75, 54.675]

df = pd.DataFrame({'a':xa, 'b':xb, 'i':ideal})

mask=(df['a']<51) & (df['b']>0)
df['c'] = df['a'].where(mask,0.9).groupby(~mask.cumsum()).cumprod()

print(df)

I want 'c' column to become like 'ideal'. This is just a sample of my 100K+ rows full dataset.

'mask' is calculated likes this: When 'a'{i}<51 AND 'b'{i}>0 ? then TRUE else FALSE

'c' column is calculated like that: When 'mask'{i}=FALSE then 'c'{i}='a'{i} else 'c'{i}=0.9*'c'{i-1}

So I hope (one day) 'c' to become like 'ideal'....

Upvotes: 1

Views: 165

Answers (2)

Jondiedoop
Jondiedoop

Reputation: 3353

I believe this solves your problem:

# First calculate the column as if there is no decay
mask=(df['a']<51) & (df['b']>0)
df['c'] = df['a'].where(~mask)
df['c'].fillna(method='ffill', inplace=True)
df['c'].fillna(0, inplace=True)

# Check how many rows since the mask has changed from True to False or v.v.
df['ones'] = 1
df['power'] = df['ones'].groupby((mask != mask.shift()).cumsum()).transform('cumsum')
# For the values in the mask, apply the decay
df['c'] = np.where(mask, 0.9 ** df['power']*df['c'], df['c'])
print(df)

Output:

     a    b       i       c  power ones
0    0  0.3   0.000   0.000       1     1
1    0  0.3   0.000   0.000       2     1
2    0  0.3   0.000   0.000       3     1
3    0  0.3   0.000   0.000       4     1
4   65  0.3  65.000  65.000       1     1
5   67  0.3  67.000  67.000       2     1
6   69  0.3  69.000  69.000       3     1
7   75  0.3  75.000  75.000       4     1
8    0  0.3  67.500  67.500       1     1
9    0  0.3  60.750  60.750       2     1
10   0  0.3  54.675  54.675       3     1

The main trick is to define one column that defines how many times to multiply the 0.9 and another that is forward filled to check what the number would have been if there hadn't been a decay. Hope this helps!

Upvotes: 1

Vaishali
Vaishali

Reputation: 38415

If I understand correctly

mask=(df['a']<51) & (df['b']>0)
df['c'] = np.where(mask, df['i'].shift() * 0.9, df['i'])
df.fillna(0, inlace = True)

    a   b   i       c
0   0   0.3 0.000   0.000
1   0   0.3 0.000   0.000
2   0   0.3 0.000   0.000
3   0   0.3 0.000   0.000
4   65  0.3 65.000  65.000
5   67  0.3 67.000  67.000
6   69  0.3 69.000  69.000
7   75  0.3 75.000  75.000
8   0   0.3 67.500  67.500
9   0   0.3 60.750  60.750
10  0   0.3 54.675  54.675

Upvotes: 0

Related Questions