Reputation: 2155
I have a dataframe with multiple columns and rows
For all columns I need to say the row value is equal to 0.5 of this row + 0.5 of the row befores value.
I currently set up a loop which is working. But I feel there is a better way without using a loop. Does anyone have any thoughts?
dataframe = df_input
df_output=df_input.copy()
for i in range(1, df_input.shape[0]):
try:
df_output.iloc[[i]]= (df_input.iloc[[i-1]]*(1/2)).values+(df_input.iloc[[i]]*(1/2)).values
except:
pass
Upvotes: 0
Views: 425
Reputation: 10880
Do you mean sth like this:
First creating test data:
np.random.seed(42)
df = pd.DataFrame(np.random.randint(0, 20, [5, 3]), columns=['A', 'B', 'C'])
A B C
0 6 19 14
1 10 7 6
2 18 10 10
3 3 7 2
4 1 11 5
Your requested function:
(df*.5).rolling(2).sum()
A B C
0 NaN NaN NaN
1 8.0 13.0 10.0
2 14.0 8.5 8.0
3 10.5 8.5 6.0
4 2.0 9.0 3.5
EDIT: for an unbalanced sum you can define an auxiliary function:
def weighted_mean(arr):
return sum(arr*[.25, .75])
df.rolling(2).apply(weighted_mean, raw=True)
A B C
0 NaN NaN NaN
1 9.00 10.00 8.00
2 16.00 9.25 9.00
3 6.75 7.75 4.00
4 1.50 10.00 4.25
EDIT2: ...and if the weights should be to be set at runtime:
def weighted_mean(arr, weights=[.5, .5]):
return sum(arr*weights/sum(weights))
No additional argument defaults to balanced mean:
df.rolling(2).apply(weighted_mean, raw=True)
A B C
0 NaN NaN NaN
1 8.0 13.0 10.0
2 14.0 8.5 8.0
3 10.5 8.5 6.0
4 2.0 9.0 3.5
An unbalanced mean:
df.rolling(2).apply(weighted_mean, raw=True, args=[[.25, .75]])
A B C
0 NaN NaN NaN
1 9.00 10.00 8.00
2 16.00 9.25 9.00
3 6.75 7.75 4.00
4 1.50 10.00 4.25
The division by sum(weights)
enables the definition of weights not only restricted to fractions of one, but by any ratio:
df.rolling(2).apply(weighted_mean, raw=True, args=[[1, 3]])
A B C
0 NaN NaN NaN
1 9.00 10.00 8.00
2 16.00 9.25 9.00
3 6.75 7.75 4.00
4 1.50 10.00 4.25
Upvotes: 1
Reputation: 2524
Some
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 1)), columns=['a'])
df["cumsum_a"] = 0.5*df["a"].cumsum() + 0.5*df["a"]
thing like below?
Upvotes: 1
Reputation: 5958
df.rolling(window=2, min_periods=1).apply(lambda x: x[0]*0.5 + x[1] if len(x) > 1 else x)
This will do the same operation for all columns.
Explanation: For each rolling object the lambda chooses the columns and x
are structured like [this_col[i], this_col[i+1]]
for all cols, and then doing custom arithmetic is straightforward.
Upvotes: 1