Caerus
Caerus

Reputation: 674

Build a Dict of Counts based on Two Dataframe Columns

I have a dataframe that looks like this:

    start   stop
0   1       2
1   3       4
2   2       1
3   4       3

I'm trying to build a dictionary with key= (start, stop) pairs from my list of tuples and the value= count of their occurrence, regardless of the order. In other words, (1,2) and (2,1) would both count as an occurrence of the pair (1,2) in the list of tuples.

Desired output: dict_count= {('1','2'):2, ('3','4'):2}

Here's my attempt:

my_list=[('1','2'),('3','4')]

for pair in my_list:
    count=0
    if ((df[df['start']]==pair[0] and df[df['end']]==pair[1]) or (df[df['start']]==pair[1]) and df[df['end']]==pair[0])::
        count+=1
    dict_count[pair]=count

However, this gives me a KeyError: KeyError: "['1' ...] not in index"

Upvotes: 3

Views: 942

Answers (2)

cs95
cs95

Reputation: 402553

Use collections.Counter:

>>> from collections import Counter
>>> Counter(map(tuple, np.sort(df[['start','stop']], axis=1)))
{(1, 2): 2, (3, 4): 2}

This does not modify your original DataFrame.

Upvotes: 6

BENY
BENY

Reputation: 323276

Using values + sort then we do groupby

df.values.sort()
df
  start stop
0   '1'  '2'
1   '3'  '4'
2   '1'  '2'
3   '3'  '4'
df.groupby(df.columns.tolist()).size()
start  stop
'1'    '2'     2
'3'    '4'     2
dtype: int64

If you need dict

df.groupby(df.columns.tolist()).size().to_dict()
{("'1'", "'2'"): 2, ("'3'", "'4'"): 2}

Update

df['orther']=1
df[['start','stop']]=np.sort(df[['start','stop']].values)
df.groupby(['start','stop']).size().to_dict()
{("'1'", "'2'"): 2, ("'3'", "'4'"): 2}

Upvotes: 5

Related Questions