fred.schwartz
fred.schwartz

Reputation: 2155

pandas dataframe row proportions

I have a dataframe with multiple columns and rows

For all columns I need to say the row value is equal to 0.5 of this row + 0.5 of the row befores value.

I currently set up a loop which is working. But I feel there is a better way without using a loop. Does anyone have any thoughts?

dataframe = df_input

df_output=df_input.copy()
for i in range(1, df_input.shape[0]):
    try:
        df_output.iloc[[i]]= (df_input.iloc[[i-1]]*(1/2)).values+(df_input.iloc[[i]]*(1/2)).values
    except:
        pass

Upvotes: 0

Views: 425

Answers (3)

SpghttCd
SpghttCd

Reputation: 10880

Do you mean sth like this:

First creating test data:

np.random.seed(42)

df = pd.DataFrame(np.random.randint(0, 20, [5, 3]), columns=['A', 'B', 'C'])

    A   B   C
0   6  19  14
1  10   7   6
2  18  10  10
3   3   7   2
4   1  11   5

Your requested function:

(df*.5).rolling(2).sum()

      A     B     C
0   NaN   NaN   NaN
1   8.0  13.0  10.0
2  14.0   8.5   8.0
3  10.5   8.5   6.0
4   2.0   9.0   3.5

EDIT: for an unbalanced sum you can define an auxiliary function:

def weighted_mean(arr):
    return sum(arr*[.25, .75])

df.rolling(2).apply(weighted_mean, raw=True)

       A      B     C
0    NaN    NaN   NaN
1   9.00  10.00  8.00
2  16.00   9.25  9.00
3   6.75   7.75  4.00
4   1.50  10.00  4.25

EDIT2: ...and if the weights should be to be set at runtime:

def weighted_mean(arr, weights=[.5, .5]):
    return sum(arr*weights/sum(weights))

No additional argument defaults to balanced mean:

df.rolling(2).apply(weighted_mean, raw=True)

      A     B     C
0   NaN   NaN   NaN
1   8.0  13.0  10.0
2  14.0   8.5   8.0
3  10.5   8.5   6.0
4   2.0   9.0   3.5

An unbalanced mean:

df.rolling(2).apply(weighted_mean, raw=True, args=[[.25, .75]])

       A      B     C
0    NaN    NaN   NaN
1   9.00  10.00  8.00
2  16.00   9.25  9.00
3   6.75   7.75  4.00
4   1.50  10.00  4.25

The division by sum(weights) enables the definition of weights not only restricted to fractions of one, but by any ratio:

df.rolling(2).apply(weighted_mean, raw=True, args=[[1, 3]])

       A      B     C
0    NaN    NaN   NaN
1   9.00  10.00  8.00
2  16.00   9.25  9.00
3   6.75   7.75  4.00
4   1.50  10.00  4.25

Upvotes: 1

shmit
shmit

Reputation: 2524

Some

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 1)), columns=['a'])
df["cumsum_a"] = 0.5*df["a"].cumsum() + 0.5*df["a"]

thing like below?

Upvotes: 1

Rocky Li
Rocky Li

Reputation: 5958

df.rolling(window=2, min_periods=1).apply(lambda x: x[0]*0.5 + x[1] if len(x) > 1 else x)

This will do the same operation for all columns.

Explanation: For each rolling object the lambda chooses the columns and x are structured like [this_col[i], this_col[i+1]] for all cols, and then doing custom arithmetic is straightforward.

Upvotes: 1

Related Questions