Reputation: 674
I have a dataframe that looks like this:
start stop
0 1 2
1 3 4
2 2 1
3 4 3
I'm trying to build a dictionary with key= (start, stop) pairs from my list of tuples and the value= count of their occurrence, regardless of the order. In other words, (1,2) and (2,1) would both count as an occurrence of the pair (1,2) in the list of tuples.
Desired output: dict_count= {('1','2'):2, ('3','4'):2}
Here's my attempt:
my_list=[('1','2'),('3','4')]
for pair in my_list:
count=0
if ((df[df['start']]==pair[0] and df[df['end']]==pair[1]) or (df[df['start']]==pair[1]) and df[df['end']]==pair[0])::
count+=1
dict_count[pair]=count
However, this gives me a KeyError:
KeyError: "['1' ...] not in index"
Upvotes: 3
Views: 942
Reputation: 402553
Use collections.Counter
:
>>> from collections import Counter
>>> Counter(map(tuple, np.sort(df[['start','stop']], axis=1)))
{(1, 2): 2, (3, 4): 2}
This does not modify your original DataFrame.
Upvotes: 6
Reputation: 323276
Using values
+ sort
then we do groupby
df.values.sort()
df
start stop
0 '1' '2'
1 '3' '4'
2 '1' '2'
3 '3' '4'
df.groupby(df.columns.tolist()).size()
start stop
'1' '2' 2
'3' '4' 2
dtype: int64
If you need dict
df.groupby(df.columns.tolist()).size().to_dict()
{("'1'", "'2'"): 2, ("'3'", "'4'"): 2}
Update
df['orther']=1
df[['start','stop']]=np.sort(df[['start','stop']].values)
df.groupby(['start','stop']).size().to_dict()
{("'1'", "'2'"): 2, ("'3'", "'4'"): 2}
Upvotes: 5