Reputation: 59274
I have a dataframe like this:
Name Range N
0 Name1 0.11 0.900000 <- # Beginning of Name1
1 Name1 0.20 0.000000
2 Name1 0.30 0.000000
3 Name2 0.40 0.300000 <- # Beginning of Name2
4 Name2 0.50 0.000000
5 Name3 0.60 0.500000 <- # Beginning of Name3
5 Name3 0.70 0.000000
6 Name3 0.80 0.000000
I want to change the values of N
.
Basically, I have a initial value for N for each name (which for names name1
, name2
and name3
are 0.90000
, 0.300000
and 0.500000
respectively), and I want, for each name, change the values of N's according to a function that depends on the N of the previous row.
My formula is as follows:
meaning the N on the i
th row is 2 times N of the previous row plus the Range of current row.
How can I do that using pandas? I have 1 Million rows so Iterating doesn't seem like a good idea. Any help is appreciated!
I though of using apply
function but it does not seem to work
Upvotes: 0
Views: 306
Reputation: 294328
def fill_this(d):
n = d.N.values
r = d.Range.values
a = np.append(n[0], r[1:])
m = a.size
k = np.arange(m)
t = np.tri(m)
v = np.power(2, (k[:, None] - k) * t) * t
return pd.Series((v * a).sum(1), d.index)
df.assign(N2=df.groupby('Name', group_keys=False).apply(fill_this))
Name Range N N2
0 Name1 0.11 0.9 0.9
1 Name1 0.20 0.0 2.0 # <-- 2 * 0.9 + .2
2 Name1 0.30 0.0 4.3 # <-- 2 * 2.0 + .3
3 Name2 0.40 0.3 0.3
4 Name2 0.50 0.0 1.1 # <-- 2 * 0.3 + .5
5 Name3 0.60 0.5 0.5
5 Name3 0.70 0.0 1.7 # <-- 2 * 0.5 + .7
6 Name3 0.80 0.0 4.2 # <-- 2 * 1.7 + .8
Explanation
Assuming the first value is N0
N1 = 2 * N0 + R1
N2 = 2 * (2 * N0 + R1) + R2
N3 = 2 * (2 * (2 * N0 + R1) + R2) + R3
Or, written another way
N1 = 2 ** 1 * N0 + 2 ** 0 * R1 + 0 * R2
N2 = 2 ** 2 * N0 + 2 ** 1 * R1 + 2 ** 0 * R2
N3 = 2 ** 3 * N0 + 2 ** 2 * R1 + 2 ** 1 * R2
I can see the pattern of powers of 2
1 0 0
2 1 0
3 2 1
I just need to line up my powers of 2, zero out the upper triangle, multiply by the correct array, and sum across rows.
Upvotes: 2
Reputation: 323286
df.N=df.N.replace({0:np.nan})
df.N=df.N.ffill()
df.N=(df.groupby('Name').cumcount()*2).replace({0:1})*df.N
df
Out[363]:
Name Range N
0 Name1 0.11 0.9
1 Name1 0.20 1.8
2 Name1 0.30 3.6
3 Name2 0.40 0.3
4 Name2 0.50 0.6
5 Name3 0.60 0.5
5 Name3 0.70 1.0
6 Name3 0.80 2.0
Upvotes: 3
Reputation: 375535
I would do this as a transform (of each group).
To calculate the contiguous groups one trick is to use shift/cumsum:
In [11]: df.Name != df.Name.shift()
Out[11]:
0 True
1 False
2 False
3 True
4 False
5 True
5 False
6 False
Name: Name, dtype: bool
In [12]: (df.Name != df.Name.shift()).cumsum()
Out[12]:
0 1
1 1
2 1
3 2
4 2
5 3
5 3
6 3
Name: Name, dtype: int64
In [13]: df["group"] = (df.Name != df.Name.shift()).cumsum()
Then we can "fill" each group (not you can add Ri to this function):
In [21]: def fill_range(start, n):
...: return np.full(n, start) * np.power(2, np.arange(0, n))
...:
In [22]: df.groupby("group")["N"].transform(lambda x: fill_range(x.iloc[0], len(x)))
Out[22]:
0 0.9
1 1.8
2 3.6
3 0.3
4 0.6
5 0.5
5 1.0
6 2.0
Name: N, dtype: float64
Upvotes: 3