rafaelc
rafaelc

Reputation: 59274

How to retrieve changed values of previous rows while within apply at pandas?

I have a dataframe like this:

    Name    Range       N
0   Name1   0.11    0.900000 <- # Beginning of Name1
1   Name1   0.20    0.000000
2   Name1   0.30    0.000000
3   Name2   0.40    0.300000 <- # Beginning of Name2
4   Name2   0.50    0.000000
5   Name3   0.60    0.500000 <- # Beginning of Name3
5   Name3   0.70    0.000000
6   Name3   0.80    0.000000

I want to change the values of N.

Basically, I have a initial value for N for each name (which for names name1, name2 and name3 are 0.90000, 0.300000 and 0.500000 respectively), and I want, for each name, change the values of N's according to a function that depends on the N of the previous row.

My formula is as follows:

formula

meaning the N on the ith row is 2 times N of the previous row plus the Range of current row.

How can I do that using pandas? I have 1 Million rows so Iterating doesn't seem like a good idea. Any help is appreciated!

I though of using apply function but it does not seem to work

Upvotes: 0

Views: 306

Answers (3)

piRSquared
piRSquared

Reputation: 294328

def fill_this(d):
    n = d.N.values
    r = d.Range.values
    a = np.append(n[0], r[1:])
    m = a.size
    k = np.arange(m)
    t = np.tri(m)
    v = np.power(2, (k[:, None] - k) * t) * t
    return pd.Series((v * a).sum(1), d.index)

df.assign(N2=df.groupby('Name', group_keys=False).apply(fill_this))

    Name  Range    N   N2
0  Name1   0.11  0.9  0.9
1  Name1   0.20  0.0  2.0  # <-- 2 * 0.9 + .2
2  Name1   0.30  0.0  4.3  # <-- 2 * 2.0 + .3
3  Name2   0.40  0.3  0.3
4  Name2   0.50  0.0  1.1  # <-- 2 * 0.3 + .5
5  Name3   0.60  0.5  0.5  
5  Name3   0.70  0.0  1.7  # <-- 2 * 0.5 + .7
6  Name3   0.80  0.0  4.2  # <-- 2 * 1.7 + .8

Explanation

Assuming the first value is N0

N1 = 2 * N0 + R1
N2 = 2 * (2 * N0 + R1) + R2
N3 = 2 * (2 * (2 * N0 + R1) + R2) + R3

Or, written another way

N1 = 2 ** 1 * N0 + 2 ** 0 * R1 +      0 * R2
N2 = 2 ** 2 * N0 + 2 ** 1 * R1 + 2 ** 0 * R2
N3 = 2 ** 3 * N0 + 2 ** 2 * R1 + 2 ** 1 * R2

I can see the pattern of powers of 2

1 0 0
2 1 0
3 2 1

I just need to line up my powers of 2, zero out the upper triangle, multiply by the correct array, and sum across rows.

Upvotes: 2

BENY
BENY

Reputation: 323286

df.N=df.N.replace({0:np.nan})
df.N=df.N.ffill()
df.N=(df.groupby('Name').cumcount()*2).replace({0:1})*df.N


df
Out[363]: 
    Name  Range    N
0  Name1   0.11  0.9
1  Name1   0.20  1.8
2  Name1   0.30  3.6
3  Name2   0.40  0.3
4  Name2   0.50  0.6
5  Name3   0.60  0.5
5  Name3   0.70  1.0
6  Name3   0.80  2.0

Upvotes: 3

Andy Hayden
Andy Hayden

Reputation: 375535

I would do this as a transform (of each group).
To calculate the contiguous groups one trick is to use shift/cumsum:

In [11]: df.Name != df.Name.shift()
Out[11]:
0     True
1    False
2    False
3     True
4    False
5     True
5    False
6    False
Name: Name, dtype: bool

In [12]: (df.Name != df.Name.shift()).cumsum()
Out[12]:
0    1
1    1
2    1
3    2
4    2
5    3
5    3
6    3
Name: Name, dtype: int64

In [13]: df["group"] = (df.Name != df.Name.shift()).cumsum()

Then we can "fill" each group (not you can add Ri to this function):

In [21]: def fill_range(start, n):
    ...:     return np.full(n, start) * np.power(2, np.arange(0, n))
    ...:

In [22]: df.groupby("group")["N"].transform(lambda x: fill_range(x.iloc[0], len(x)))
Out[22]:
0    0.9
1    1.8
2    3.6
3    0.3
4    0.6
5    0.5
5    1.0
6    2.0
Name: N, dtype: float64

Upvotes: 3

Related Questions