Taie
Taie

Reputation: 1179

Averaging values across two different columns

i have a dataframe as shown below. The elements in col_1 are connected to the elements in col_2, giving results in output. On the other hand, some elements in col_2 are also in col_1 as well. For example: a-->b= 3 and b-->a= 24.

col_1   col_2   output      average
a        b       3            13.5   (because a-->b=3 and b-->a=24)
a        c       5             3.5   (because a-->c=5 and c-->a=2)
a        d       3      
b        a       24     
b        c       12     
b        d       5      
c        a       2      
c        b       3      
c        d       5

What i need is to calculate the average of these two values and of course all similar cases across the dataframe.

You can think of the data as follows: people in col_1 are calling people in col_2. The output is the duration. I want to calculate the average duration between every pair of people

I have tried using pd.merge(df.col_1, df.col_2) but didn't work. Any suggestions will be appreciated.

Upvotes: 2

Views: 132

Answers (2)

fsl
fsl

Reputation: 3280

Here is one way to achieve this, though I might be reading too much into your deliberately oversimplified example.

# Add a (temporary) column with the union of col_1 and col_2 
df['uniques'] = df[['col_1', 'col_2']].apply(np.unique, axis=1).astype(str) 


# Then simply groupby it and average it out 
df['mean'] = df.groupby(['uniques']).transform(np.mean)   

Output:

  col_1 col_2  output    uniques  mean
0     a     b       3  ['a' 'b']  13.5
1     a     c       5  ['a' 'c']   3.5
2     a     d       3  ['a' 'd']   3.0
3     b     a      24  ['a' 'b']  13.5
4     b     c      12  ['b' 'c']   7.5
5     b     d       5  ['b' 'd']   5.0
6     c     a       2  ['a' 'c']   3.5
7     c     b       3  ['b' 'c']   7.5
8     c     d       5  ['c' 'd']   5.0

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150735

You could sort the two columns horizontally and groupby:

s = pd.DataFrame(np.sort(df[['col_1','col_2']].values, axis=1), index=df.index)

df['mean'] = df['output'].groupby([s[0],s[1]]).transform('mean')

Output:

  col_1 col_2  output  mean
0     a     b       3  13.5
1     a     c       5   3.5
2     a     d       3   3.0
3     b     a      24  13.5
4     b     c      12   7.5
5     b     d       5   5.0
6     c     a       2   3.5
7     c     b       3   7.5
8     c     d       5   5.0

Upvotes: 1

Related Questions