Reputation: 1179
i have a dataframe as shown below. The elements in col_1
are connected to the elements in col_2
, giving results in output
. On the other hand, some elements in col_2
are also in col_1
as well. For example: a-->b= 3
and b-->a= 24
.
col_1 col_2 output average
a b 3 13.5 (because a-->b=3 and b-->a=24)
a c 5 3.5 (because a-->c=5 and c-->a=2)
a d 3
b a 24
b c 12
b d 5
c a 2
c b 3
c d 5
What i need is to calculate the average of these two values and of course all similar cases across the dataframe.
You can think of the data as follows: people in col_1 are calling people in col_2. The output is the duration. I want to calculate the average duration between every pair of people
I have tried using pd.merge(df.col_1, df.col_2)
but didn't work.
Any suggestions will be appreciated.
Upvotes: 2
Views: 132
Reputation: 3280
Here is one way to achieve this, though I might be reading too much into your deliberately oversimplified example.
# Add a (temporary) column with the union of col_1 and col_2
df['uniques'] = df[['col_1', 'col_2']].apply(np.unique, axis=1).astype(str)
# Then simply groupby it and average it out
df['mean'] = df.groupby(['uniques']).transform(np.mean)
Output:
col_1 col_2 output uniques mean
0 a b 3 ['a' 'b'] 13.5
1 a c 5 ['a' 'c'] 3.5
2 a d 3 ['a' 'd'] 3.0
3 b a 24 ['a' 'b'] 13.5
4 b c 12 ['b' 'c'] 7.5
5 b d 5 ['b' 'd'] 5.0
6 c a 2 ['a' 'c'] 3.5
7 c b 3 ['b' 'c'] 7.5
8 c d 5 ['c' 'd'] 5.0
Upvotes: 2
Reputation: 150735
You could sort the two columns horizontally and groupby:
s = pd.DataFrame(np.sort(df[['col_1','col_2']].values, axis=1), index=df.index)
df['mean'] = df['output'].groupby([s[0],s[1]]).transform('mean')
Output:
col_1 col_2 output mean
0 a b 3 13.5
1 a c 5 3.5
2 a d 3 3.0
3 b a 24 13.5
4 b c 12 7.5
5 b d 5 5.0
6 c a 2 3.5
7 c b 3 7.5
8 c d 5 5.0
Upvotes: 1