Reputation: 31
Let's say I have data in this format (assume tab delimited)
1 10,11,15
2 12
3 12,11
4 10,11
How can I iterate through the list and count the most popular pairs of objects in the second column? Assume that the second column can have an unlimited number of items.
The ideal output would return something like
pairs count
10,11 (2)
10,15 (1)
11,15 (1)
11,12 (1)
Upvotes: 1
Views: 1454
Reputation: 250951
In [7]: with open("data1.txt") as f:
lis=[map(int,x.split(",")) for x in f]
...:
In [8]: Counter(chain(*[combinations(x,2) for x in lis]))
Out[8]: Counter({(10, 11): 2, (10, 15): 1, (11, 15): 1, (12, 11): 1})
Upvotes: 0
Reputation: 37259
These both make the assumption that you can get your input into a list of lists:
If you have Python 2.7, try a Counter
in combination with itertools
:
>>> from collections import Counter
>>> from itertools import combinations
>>> l = [[10, 11, 15], [12], [12, 11], [10, 11]]
>>> c = Counter(x for sub in l for x in combinations(sub, 2))
>>> for k, v in c.iteritems():
... print k, v
...
(10, 15) 1
(11, 15) 1
(10, 11) 2
(12, 11) 1
If you have Python < 2.6, you could use a defaultdict
in combination with itertools
(a cleaner solution will be provided by one of the gurus I'm sure).
In [1]: from collections import defaultdict
In [2]: from itertools import combinations
In [3]: l = [[10, 11, 15], [12], [12, 11], [10, 11]]
In [4]: counts = defaultdict(int)
In [5]: for x in l:
...: for item in combinations(x, 2):
...: counts[item] += 1
...:
...:
In [6]: for k, v in counts.iteritems():
...: print k, v
...:
...:
(10, 15) 1
(11, 15) 1
(10, 11) 2
(12, 11) 1
Upvotes: 5
Reputation: 17052
You could use combinations
and a Counter
.
from itertools import combinations
import collections
newinput = []
# Removes the tabs
for line in oldinput:
newinput.append(line.partition("\t")[2])
# set up the counter
c = collections.Counter()
for line in newinput:
# Split by comma
a = line.split(',')
# make into integers from string
a = map(int, a)
# add to counter
c.update(combinations(a, 2))
Then, you end up with a Counter
that has all of your counts:
`(10,15): 1) etc.
Upvotes: 0