Averaging values across two different columns

Question

i have a dataframe as shown below. The elements in col_1 are connected to the elements in col_2, giving results in output. On the other hand, some elements in col_2 are also in col_1 as well. For example: a-->b= 3 and b-->a= 24.

col_1   col_2   output      average
a        b       3            13.5   (because a-->b=3 and b-->a=24)
a        c       5             3.5   (because a-->c=5 and c-->a=2)
a        d       3      
b        a       24     
b        c       12     
b        d       5      
c        a       2      
c        b       3      
c        d       5

What i need is to calculate the average of these two values and of course all similar cases across the dataframe.

You can think of the data as follows: people in col_1 are calling people in col_2. The output is the duration. I want to calculate the average duration between every pair of people

I have tried using pd.merge(df.col_1, df.col_2) but didn't work. Any suggestions will be appreciated.

Quang Hoang · Accepted Answer

You could sort the two columns horizontally and groupby:

s = pd.DataFrame(np.sort(df[['col_1','col_2']].values, axis=1), index=df.index)

df['mean'] = df['output'].groupby([s[0],s[1]]).transform('mean')

Output:

  col_1 col_2  output  mean
0     a     b       3  13.5
1     a     c       5   3.5
2     a     d       3   3.0
3     b     a      24  13.5
4     b     c      12   7.5
5     b     d       5   5.0
6     c     a       2   3.5
7     c     b       3   7.5
8     c     d       5   5.0

Averaging values across two different columns

Answers (2)

Related Questions