Julian
Julian

Reputation: 613

How two average values of two pandas dataframes, but only if both cells are available?

I have 2 dataframes like this:

df1 = pd.DataFrame({
    "a": [0.5,0.5,0.5],
    "b": [0.5,0.5,0.5],
    "c": [0.5,0.5,0.5]
}, index = ["a", "b", "c"])
df1

enter image description here

df2 = pd.DataFrame({
    "a": [0.3,0.3],
    "b": [0.3,0.3],
}, index = ["a", "b"])
df2

enter image description here

df2 is always a subset of df1.

I want to calculate a weighted average between both if df2 also contains the cell of df1 (same index and column). If not, the value of df1 should remain.

I can do this like this:

average = pd.DataFrame()

for r in df1.columns:
    for c in df1.columns:
        try:
            average.loc[r, c] = 0.4 * df1.loc[r, c] + 0.6 * df2.loc[r, c]
        except KeyError:
            average.loc[r, c] = df1.loc[r, c]
        
average

enter image description here

However, this takes quite a long time for bigger dataframes - is there a quicker way to do it?

Many thanks!

Upvotes: 0

Views: 11

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150825

Try fillna:

(df1 * 0.4 + df2 * 0.6).fillna(df1)

Output:

      a     b    c
a  0.38  0.38  0.5
b  0.38  0.38  0.5
c  0.50  0.50  0.5

Upvotes: 1

Related Questions