Caerus
Caerus

Reputation: 674

Groupby two columns ignoring order of pairs

Suppose we have a dataframe that looks like this:

    start   stop   duration
0   A       B      1
1   B       A      2
2   C       D      2
3   D       C      0

What's the best way to construct a list of: i) start/stop pairs; ii) count of start/stop pairs; iii) avg duration of start/stop pairs? In this case, order should not matter: (A,B)=(B,A).

Desired output: [[start,stop,count,avg duration]]

In this example: [[A,B,2,1.5],[C,D,2,1]]

Upvotes: 7

Views: 2154

Answers (2)

Divyanshu Srivastava
Divyanshu Srivastava

Reputation: 1507

In one line, this can also be achieved by

df.apply(lambda x: x.append(pd.Series(','.join([str(x) for x in sorted(x[['start', 'stop']])]))), axis=1).groupby([0]).duration.agg(['count', 'mean'])

Result:

     count  mean
0               
A,B      2   1.5
C,D      2   1.0

Upvotes: 0

cs95
cs95

Reputation: 402323

sort the first two columns (you can do this in-place, or create a copy and do the same thing; I've done the former), then groupby and agg:

df[['start', 'stop']] = np.sort(df[['start', 'stop']], axis=1)

(df.groupby(['start','stop'])
   .duration
   .agg(['count', 'mean'])
   .reset_index()
   .values
   .tolist())
# [['A', 'B', 2, 1.5], ['C', 'D', 2, 1.0]]

Upvotes: 9

Related Questions