Probability distribution of two lists of words

Question

I have two lists of string:

(Pdb) word_list1
['first', 'sentence', 'ant', 'first', 'whatever']
(Pdb) word_list2
['second', 'second', 'heck', 'anything', 'youtube', 'gmail', 'hotmail']

I want to compute the probability distribution of the union of words for each of the two sets for each word.

(Pdb) print list(set(word_list1) | set(word_list2))
['hotmail', 'anything', 'sentence', 'maybe', 'youtube', 'whatever', 'ant', 'second', 'heck', 'gmail', 'first']
(Pdb) len(list(set(word_list1) | set(word_list2)))
11

So, I want two vectors of length 11, one for each wordlist.

Probability distribution of two lists of words

Answers (1)

Related Questions