Reputation: 55
I have a dataframe and a maximum value:
max_factor = 20
df = pd.DataFrame({'a':[1,1,1,2,2,2,3,3,3,3],'b':np.random.randn(10)})
df
a b
0 1 -0.424957
1 1 1.893320
2 1 0.187929
3 2 -1.413340
4 2 1.737371
5 2 0.959317
6 3 -0.554445
7 3 0.100595
8 3 -0.176009
9 3 0.430475
I want to create another column 'c' and populate it with values that depend on values in 'a', 'b' and 'c'.
For example:
if value(a) == 1 or value(a) == 2:
value(c) = 0
else:
value(c) = value(b)/max_factor + value(c-1)
I have tried multiple ways to do this but am struggling. Do I have to iterate through each row or is there a faster way to do this?
EDIT: The actual function to generate the values in column 'c' is more complicated but this would be a great starting point.
Upvotes: 3
Views: 11928
Reputation: 15953
If you have multiple conditions besides this example you can use apply
:
def foo(row):
if row['a'] == 1 or row['a'] == 2:
global v
v = 0
else:
v_old = v
v = row['b']/20+v_old
return v
df['c'] = df.apply(foo,axis=1)
a b c
0 1 0.858951 0.000000
1 1 0.588102 0.000000
2 1 1.452569 0.000000
3 2 1.400972 0.000000
4 2 -0.921342 0.000000
5 2 -1.117748 0.000000
6 3 0.792742 0.039637
7 3 0.254630 0.052369
8 3 0.351391 0.069938
9 3 1.822267 0.161052
Upvotes: 3