Reputation: 479
Using python, I define the following list of string lists
some_list = [['abc', 'aa', 'xdf'], ['def', 'asd'], ['abc', 'xyz'], ['ghi', 'edd'], ['abc', 'xyz'], ['abc', ]]
What is the best way to select and count the strings that appear in the same sublist than 'abc' ? In this example I am looking for an output (dictionnary or list) such as:
('aa', 1), ('xdf', 1), ('xyz',2), (None, 1)
The (None, 1)
would capture the cases where 'abc'
is the only string of the sublists).
Upvotes: 1
Views: 80
Reputation: 81
My method without importing any packages:
some_list = [['abc', 'aa', 'xdf'], ['def', 'asd'], ['abc', 'xyz'],
['ghi', 'edd'], ['abc', 'xyz'], ['abc', ]]
d = {'None':0}
for e in some_list:
if 'abc' in e and len(e) != 1:
for f in e:
if f != 'abc' and f not in d:
d[f] = 1
elif f != 'abc' and f in d:
d[f] +=1
elif 'abc' in e and len(e) == 1:
d['None'] += 1
print(d)
This code will print:
{'None': 1, 'aa': 1, 'xdf': 1, 'xyz': 2}
Upvotes: 1
Reputation: 35686
You can filter your list on search criteria, then filter out the search value to get your co-occurences. Then you can use any approach to counting their frequency.
from collections import Counter
from itertools import chain
some_list = [['abc', 'aa', 'xdf'], ['def', 'asd'], ['abc', 'xyz'], ['ghi', 'edd'], ['abc', 'xyz'], ['abc', ]]
search = 'abc'
co_occurrences = [[v for v in lst if v != search] if len(lst) > 1 else [None] for lst in some_list if search in lst]
print(co_occurrences)
c = Counter(chain.from_iterable(co_occurrences))
print(c)
Output:
[['aa', 'xdf'], ['xyz'], ['xyz'], [None]]
Counter({'xyz': 2, 'aa': 1, 'xdf': 1, None: 1})
Upvotes: 1
Reputation: 215029
I'd count them all first and then handle the edge case separately:
from collections import Counter
from itertools import chain
c = Counter(chain.from_iterable(x for x in A if 'abc' in x))
c[None] = A.count(['abc'])
Upvotes: 3
Reputation: 9019
Here is a long-handed approach leveraging defaultdict()
:
from collections import defaultdict
some_list = [['abc', 'aa', 'xdf'], ['def', 'asd'], ['abc', 'xyz'], ['ghi', 'edd'], ['abc', 'xyz'], ['abc', ]]
d = defaultdict(int)
for i in some_list:
if 'abc' in i:
if len(i)==1:
d[None] += 1
for j in i:
if j=='abc': continue
d[j] += 1
Yields:
defaultdict(<class 'int'>, {'aa': 1, 'xdf': 1, 'xyz': 2, None: 1})
Upvotes: 1