Reputation: 2041
I have this dataframe:
source target
0 ape dog
1 ape hous
2 dog hous
3 hors dog
4 hors ape
5 dog ape
6 ape bird
7 ape hous
8 bird hous
9 bird fist
10 bird ape
11 fist ape
I am trying to generate a frequency count with this code:
df_count =df.groupby(['source', 'target']).size().reset_index().sort_values(0, ascending=False)
df_count.columns = ['source', 'target', 'weight']
I get the result below.
source target weight
2 ape hous 2
0 ape bird 1
1 ape dog 1
3 bird ape 1
4 bird fist 1
5 bird hous 1
6 dog ape 1
7 dog hous 1
8 fist ape 1
9 hors ape 1
10 hors dog 1
How can I modify the code so that direction does not matter, i.e. that instead of ape bird 1
and bird ape 1
, i get ape bird 2
?
Upvotes: 3
Views: 133
Reputation: 77027
First sort the values row-wise.
In [31]: df
Out[31]:
source target
0 ape dog
1 ape hous
2 dog hous
3 hors dog
4 hors ape
5 dog ape
6 ape bird
7 ape hous
8 bird hous
9 bird fist
10 bird ape
11 fist ape
In [32]: df.values.sort()
In [33]: df
Out[33]:
source target
0 ape dog
1 ape hous
2 dog hous
3 dog hors
4 ape hors
5 ape dog
6 ape bird
7 ape hous
8 bird hous
9 bird fist
10 ape bird
11 ape fist
Then,groupby
on source, target
, aggregate by size, sort
the result.
In [34]: df.groupby(['source', 'target']).size().sort_values(ascending=False)
...: .reset_index(name='weight')
Out[34]:
source target weight
0 ape hous 2
1 ape dog 2
2 ape bird 2
3 dog hous 1
4 dog hors 1
5 bird hous 1
6 bird fist 1
7 ape hors 1
8 ape fist 1
Upvotes: 5
Reputation: 863531
You can first sort by rows by apply
and then add parameter name
to reset_index
:
df_count = df.apply(sorted, axis=1) \
.groupby(['source', 'target']) \
.size() \
.reset_index(name='weight') \
.sort_values('weight', ascending=False)
print (df_count)
source target weight
0 ape bird 2
1 ape dog 2
4 ape hous 2
2 ape fist 1
3 ape hors 1
5 bird fist 1
6 bird hous 1
7 dog hors 1
8 dog hous 1
Upvotes: 4