Reputation: 8166
The following code creates a random dataframe with values -1, 0 or 1:
df = pd.DataFrame(np.random.randint(-1,2,size=(100, 1)), columns=['val'])
print(df['val'].value_counts())
Let's see what it contains:
-1 36
0 35
1 29
Name: val, dtype: int64
Then, I'm trying to create a new column called mysum
with a cumulative conditional sum which following the next rules:
If val = 1 and mysum < 0, then mysum = mysum + 2.
If val = -1 and mysum <= 0, then mysum = mysum - 1.
If val = -1 and mysum > 0, then mysum = mysum - 2
If val = 0 and mysum < 0, then mysum = mysum + 1.
If val = 0 and mysum > 0, then mysum = mysum - 1.
If val = 0 and mysum = 0, then mysum = mysum.
So I'm afraid it is not as simple as:
df['mysum'] = df['val'].cumsum()
So I tried the following:
df['mysum'] = 0
df['mysum'] = np.where((df['val'] == 1) & (df['mysum'].cumsum() >= 0), (df['mysum'].cumsum() + 1), df['mysum'].cumsum())
df['mysum'] = np.where((df['val'] == 1) & (df['mysum'].cumsum() < 0), (df['mysum'].cumsum() + 2), df['mysum'].cumsum())
df['mysum'] = np.where((df['val'] == -1) & (df['mysum'].cumsum() <= 0), (df['mysum'].cumsum() - 1), df['mysum'].cumsum())
df['mysum'] = np.where((df['val'] == -1) & (df['mysum'].cumsum() > 0), (df['mysum'].cumsum() - 2), df['mysum'].cumsum())
df['mysum'] = np.where((df['val'] == 0) & (df['mysum'].cumsum() > 0), (df['mysum'].cumsum() - 1), df['mysum'].cumsum())
df['mysum'] = np.where((df['val'] == 0) & (df['mysum'].cumsum() < 0), (df['mysum'].cumsum() + 1), df['mysum'].cumsum())
print(df['mysum'].value_counts())
print(df)
But the column mysum
is not accumulating!
Here is a fiddle where you can try: https://repl.it/FaXZ/8
Upvotes: 1
Views: 654
Reputation: 616
A much more efficient solution, see also generalized cumulative functions in NumPy/SciPy? :
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(-1, 2, size=(100, 1)), columns=['val'])
def my_sum(acc,x):
if x == 0 and acc < 0:
return acc + 1
if x == 1 and acc < 0:
return acc + 2
if x == -1 and acc <= 0:
return acc - 1
if x == 0 and acc > 0:
return acc - 1
if x == -1 and acc > 0:
return acc - 2
if x == 1 and acc >= 0:
return acc + 1
if x == 0 and acc == 0:
return acc
u_my_sum = np.frompyfunc(my_sum, 2, 1)
df['mysum'] = u_my_sum.accumulate(df.val, dtype=np.object).astype(np.int64)
print(df)
Upvotes: 1
Reputation: 2859
Maybe there exists a more condensed solution, but you can loop through the dataframe and set values according to your conditions.
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(-1, 2, size=(100, 1)), columns=['val'])
df['mysum'] = 0
for index, row in df.iterrows():
# get the current value of mysum = mysum one row above current index
mysum = df.get_value(index - 1, 1, takeable=True)
# mysum at beginning is 0
if index == 0:
mysum = 0
# set values at current index according to conditions
if row[0] == 0 and mysum < 0:
df.set_value(index, 1, mysum + 1, takeable=True)
if row[0] == 1 and mysum < 0:
df.set_value(index, 1, mysum + 2, takeable=True)
if row[0] == -1 and mysum <= 0:
df.set_value(index, 1, mysum - 1, takeable=True)
if row[0] == 0 and mysum > 0:
df.set_value(index, 1, mysum - 1, takeable=True)
if row[0] == -1 and mysum > 0:
df.set_value(index, 1, mysum - 2, takeable=True)
if row[0] == 1 and mysum >= 0:
df.set_value(index, 1, mysum + 1, takeable=True)
if row[0] == 0 and mysum == 0:
df.set_value(index, 1, mysum, takeable=True)
print df
Upvotes: 1