djourd1
djourd1

Reputation: 479

Counting co-occurence in a list of list

Using python, I define the following list of string lists

some_list = [['abc', 'aa', 'xdf'], ['def', 'asd'], ['abc', 'xyz'], ['ghi', 'edd'], ['abc', 'xyz'], ['abc', ]]

What is the best way to select and count the strings that appear in the same sublist than 'abc' ? In this example I am looking for an output (dictionnary or list) such as:

('aa', 1), ('xdf', 1), ('xyz',2), (None, 1)

The (None, 1) would capture the cases where 'abc' is the only string of the sublists).

Upvotes: 1

Views: 80

Answers (4)

xiao
xiao

Reputation: 81

My method without importing any packages:

some_list = [['abc', 'aa', 'xdf'], ['def', 'asd'], ['abc', 'xyz'], 
             ['ghi', 'edd'], ['abc', 'xyz'], ['abc', ]]
d = {'None':0}
for e in some_list:
    if 'abc' in e and len(e) != 1:
        for f in e:
            if f != 'abc' and f not in d:
                d[f] = 1
            elif f != 'abc' and f in d:
                d[f] +=1
    elif 'abc' in e and len(e) == 1:
        d['None'] += 1
print(d)

This code will print:

{'None': 1, 'aa': 1, 'xdf': 1, 'xyz': 2}

Upvotes: 1

Henry Ecker
Henry Ecker

Reputation: 35686

You can filter your list on search criteria, then filter out the search value to get your co-occurences. Then you can use any approach to counting their frequency.

from collections import Counter
from itertools import chain

some_list = [['abc', 'aa', 'xdf'], ['def', 'asd'], ['abc', 'xyz'], ['ghi', 'edd'], ['abc', 'xyz'], ['abc', ]]
search = 'abc'
co_occurrences = [[v for v in lst if v != search] if len(lst) > 1 else [None] for lst in some_list if search in lst]
print(co_occurrences)
c = Counter(chain.from_iterable(co_occurrences))
print(c)

Output:

[['aa', 'xdf'], ['xyz'], ['xyz'], [None]]
Counter({'xyz': 2, 'aa': 1, 'xdf': 1, None: 1})

Upvotes: 1

georg
georg

Reputation: 215029

I'd count them all first and then handle the edge case separately:

from collections import Counter
from itertools import chain

c = Counter(chain.from_iterable(x for x in A if 'abc' in x))
c[None] = A.count(['abc'])

Upvotes: 3

rahlf23
rahlf23

Reputation: 9019

Here is a long-handed approach leveraging defaultdict():

from collections import defaultdict

some_list = [['abc', 'aa', 'xdf'], ['def', 'asd'], ['abc', 'xyz'], ['ghi', 'edd'], ['abc', 'xyz'], ['abc', ]]

d = defaultdict(int)

for i in some_list:
    if 'abc' in i:
        if len(i)==1:
            d[None] += 1
        for j in i:
            if j=='abc': continue
            d[j] += 1

Yields:

defaultdict(<class 'int'>, {'aa': 1, 'xdf': 1, 'xyz': 2, None: 1})

Upvotes: 1

Related Questions