Weighted average of dataframes with mask on NaN's

Question

I have found some answers about averaging dataframes, but none that includes the treatment of weights. I have figured a way to get to the result I want (see title) but I wonder if there is a more direct way of achieving the same goal.

EDIT: I need to average more than just two dataframes, however the example code below only includes two of them.

import pandas as pd
import numpy as np

df1 = pd.DataFrame([[np.nan, 2, np.nan, 0],
                    [3, 4, np.nan, 1],
                    [np.nan, np.nan, np.nan, 5],
                    [np.nan, 3, np.nan, 4]],
                   columns=list('ABCD'))

df2 = pd.DataFrame([[3, 1, np.nan, 1],
                    [2, 5, np.nan, 3],
                    [np.nan, 4, np.nan, 2],
                    [np.nan, 2, 1, 5]],
                   columns=list('ABCD'))

What I do is:

transform each dataframe into array of arrays (rows), put all so-transformed dataframes into an array:

def fromDfToArraysStack(df):

    for i in range(len(df)):
         arrayRow = df.iloc[i].values

         if i == 0:
             arraysStack = arrayRow
         else:
             arraysStack = np.vstack((arraysStack, arrayRow))

    return arraysStack

arraysStack1 = fromDfToArraysStack(df1)
arraysStack2 = fromDfToArraysStack(df2)
arrayOfArrays = np.array([arraysStack1, arraysStack2])

apply a mask to the nans and take the average:

masked = np.ma.masked_array(arrayOfArrays,
                            np.isnan(arrayOfArrays))
arrayAve = np.ma.average(masked,
                         axis = 0,
                         weights = [1,2])

transform back to dataframe while putting nans back in:

pd.DataFrame(np.row_stack(arrayAve.filled(np.nan)))

    0           1           2   3
0   3.000000    1.333333    NaN 0.666667
1   2.333333    4.666667    NaN 2.333333
2   NaN         4.000000    NaN 3.000000
3   NaN         2.333333    1.0 4.666667

As I said this works, but hopefully there is a more concise way to do this, one-liner anybody ?

Clade · Accepted Answer

To make it a tidy one-line, I cheated a little with the imports, but here is the best I could do:

import pandas as pd
import numpy as np
from numpy.ma import average as avg
from numpy.ma import masked_array as ma

df1 = pd.DataFrame([[np.nan, 2, np.nan, 0],
                    [3, 4, np.nan, 1],
                    [np.nan, np.nan, np.nan, 5],
                    [np.nan, 3, np.nan, 4]],
                   columns=list('ABCD'))

df2 = pd.DataFrame([[3, 1, np.nan, 1],
                    [2, 5, np.nan, 3],
                    [np.nan, 4, np.nan, 2],
                    [np.nan, 2, 1, 5]],
                   columns=list('ABCD'))

df1.combine(df2, lambda x, y: avg([ma(x, np.isnan(x)), ma(y, np.isnan(y))], 0, [1, 2]))

EDIT:

import pandas as pd
import numpy as np
from numpy.ma import average as avg
from numpy.ma import masked_array as ma

df1 = pd.DataFrame([[np.nan, 2, np.nan, 0],
                    [3, 4, np.nan, 1],
                    [np.nan, np.nan, np.nan, 5],
                    [np.nan, 3, np.nan, 4]],
                   columns=list('ABCD'))

df2 = pd.DataFrame([[3, 1, np.nan, 1],
                    [2, 5, np.nan, 3],
                    [np.nan, 4, np.nan, 2],
                    [np.nan, 2, 1, 5]],
                   columns=list('ABCD'))

def df_average(dfs, wgts):
      return pd.DataFrame(avg([ma(df.values, np.isnan(df.values)) for df in dfs], 0, wgts))


df_average(dfs=[df1, df2], wgts=[1, 2])

Weighted average of dataframes with mask on NaN's

Answers (2)

Related Questions

Weighted average of dataframes with mask on NaN&#39;s

Answers (2)

Related Questions

Weighted average of dataframes with mask on NaN's