AlarmedPigeon
AlarmedPigeon

Reputation: 31

How can I find pairs using python?

Let's say I have data in this format (assume tab delimited)

1   10,11,15
2   12
3   12,11
4   10,11

How can I iterate through the list and count the most popular pairs of objects in the second column? Assume that the second column can have an unlimited number of items.

The ideal output would return something like

pairs count
10,11 (2)
10,15 (1)
11,15 (1)
11,12 (1)

Upvotes: 1

Views: 1454

Answers (3)

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 250951

In [7]: with open("data1.txt") as f:
        lis=[map(int,x.split(",")) for x in f]
   ...:     

In [8]: Counter(chain(*[combinations(x,2) for x in lis]))
Out[8]: Counter({(10, 11): 2, (10, 15): 1, (11, 15): 1, (12, 11): 1})

Upvotes: 0

RocketDonkey
RocketDonkey

Reputation: 37259

These both make the assumption that you can get your input into a list of lists:

If you have Python 2.7, try a Counter in combination with itertools:

>>> from collections import Counter
>>> from itertools import combinations
>>> l = [[10, 11, 15], [12], [12, 11], [10, 11]]
>>> c = Counter(x for sub in l for x in combinations(sub, 2))
>>> for k, v in c.iteritems():
...   print k, v
...
(10, 15) 1
(11, 15) 1
(10, 11) 2
(12, 11) 1

If you have Python < 2.6, you could use a defaultdict in combination with itertools (a cleaner solution will be provided by one of the gurus I'm sure).

In [1]: from collections import defaultdict

In [2]: from itertools import combinations

In [3]: l = [[10, 11, 15], [12], [12, 11], [10, 11]]

In [4]: counts = defaultdict(int)

In [5]: for x in l:
   ...:     for item in combinations(x, 2):
   ...:         counts[item] += 1
   ...:
   ...:

In [6]: for k, v in counts.iteritems():
   ...:     print k, v
   ...:
   ...:
(10, 15) 1
(11, 15) 1
(10, 11) 2
(12, 11) 1

Upvotes: 5

jdotjdot
jdotjdot

Reputation: 17052

You could use combinations and a Counter.

from itertools import combinations
import collections

newinput = []

# Removes the tabs
for line in oldinput:
    newinput.append(line.partition("\t")[2])

# set up the counter
c = collections.Counter()

for line in newinput:
    # Split by comma
    a = line.split(',')
    # make into integers from string
    a = map(int, a)
    # add to counter
    c.update(combinations(a, 2))

Then, you end up with a Counter that has all of your counts: `(10,15): 1) etc.

Upvotes: 0

Related Questions