Reputation: 9806
I have two lists of string
:
(Pdb) word_list1
['first', 'sentence', 'ant', 'first', 'whatever']
(Pdb) word_list2
['second', 'second', 'heck', 'anything', 'youtube', 'gmail', 'hotmail']
I want to compute the probability distribution of the union of words for each of the two sets for each word.
(Pdb) print list(set(word_list1) | set(word_list2))
['hotmail', 'anything', 'sentence', 'maybe', 'youtube', 'whatever', 'ant', 'second', 'heck', 'gmail', 'first']
(Pdb) len(list(set(word_list1) | set(word_list2)))
11
So, I want two vectors of length 11, one for each wordlist.
Upvotes: 1
Views: 351
Reputation: 31171
You need more a dictionary with 11 elements as a result, and go for Counter
instead of set
operations if you are looking for frequencies:
from collections import Counter
n = len(l1) + len(l2)
dic = dict(Counter(l1) + Counter(l2))
# for the first list
{k:round(v/n,2) if k in l1 else 0 for k,v in dic.iteritems()}
#{'ant': 0.09,
# 'anything': 0,
# 'first': 0.18,
# 'gmail': 0,
# 'heck': 0,
# 'hotmail': 0,
# 'second': 0,
# 'sentence': 0.09,
# 'whatever': 0.09,
# 'youtube': 0}
Upvotes: 1