Reputation: 6869
I have a DataFrame that looks like:
import pandas as pd
df = pd.DataFrame([[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0],
[9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0],
[17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0]],
columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'])
A B C D E F G H
0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0
1 9.0 10.0 11.0 12.0 13.0 14.0 15.0 16.0
2 17.0 18.0 19.0 20.0 21.0 22.0 23.0 24.0
And I have a list of columns:
l = ['A', 'C', 'D', 'E']
For each element of my list, I want to get the mean of the dataframe columns that precede it plus twice the value in its own column. So, A
will only depend on itself, C
will depend on A
and itself, D
will depend on the sum of A
, C
, and itself, and E
will depend on A
, C
, D
, and itself. I have accomplished what I need in the following way:
for i, col in enumerate(l):
other_cols = l[:i]
df['tmp_' + col] = df[other_cols].mean(axis=1) + 2.0 * df[col]
A B C D E F G H tmp_A tmp_C tmp_D \
0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 NaN 7.0 10.0
1 9.0 10.0 11.0 12.0 13.0 14.0 15.0 16.0 NaN 31.0 34.0
2 17.0 18.0 19.0 20.0 21.0 22.0 23.0 24.0 NaN 55.0 58.0
tmp_E
0 12.666667
1 36.666667
2 60.666667
I was wondering if there was an even more Pythonic way to accomplish the same thing rather than having to run through the for loop?
Upvotes: 1
Views: 69
Reputation: 353179
IIUC, you can use expanding
in modern pandas to handle this:
>>> cols = ["A","C","D","E"]
>>> df[cols] * 2 + df[cols].expanding(axis=1).mean().shift(axis=1).fillna(0)
A C D E
0 2.0 7.0 10.0 12.666667
1 18.0 31.0 34.0 36.666667
2 34.0 55.0 58.0 60.666667
This reproduces your expected new columns (and has A become twice its original value, thanks to the fillna turning the NaNs to 0s).
We can see where this comes from step by step:
Starting from
>>> df[cols]
A C D E
0 1.0 3.0 4.0 5.0
1 9.0 11.0 12.0 13.0
2 17.0 19.0 20.0 21.0
>>> df[cols].expanding(axis=1)
Expanding [min_periods=1,center=False,axis=1]
We can do sum
first, because it's easier to check visually:
>>> df[cols].expanding(axis=1).sum()
A C D E
0 1.0 4.0 8.0 12.0
1 9.0 20.0 32.0 36.0
2 17.0 36.0 56.0 60.0
>>> df[cols].expanding(axis=1).mean()
A C D E
0 1.0 2.0 2.666667 4.0
1 9.0 10.0 10.666667 12.0
2 17.0 18.0 18.666667 20.0
>>> df[cols].expanding(axis=1).mean().shift(axis=1)
A C D E
0 NaN 1.0 2.0 2.666667
1 NaN 9.0 10.0 10.666667
2 NaN 17.0 18.0 18.666667
>>> df[cols].expanding(axis=1).mean().shift(axis=1).fillna(0)
A C D E
0 0.0 1.0 2.0 2.666667
1 0.0 9.0 10.0 10.666667
2 0.0 17.0 18.0 18.666667
Upvotes: 1