Reputation: 563
I am trying to use CountVectorizer from sklearn with a list of lists.
Lst=[['apple','peach','mango'],['apple','apple','mango']]
I would like the output to return the count of words in each list. For example:
0:apple:1
0:peach:1
0:mango:1
1:apple:2
1:peach:0
1:mango:1
or any other format.
I found this post that is similar to mine, but the answer wasn't complete.
How should I vectorize the following list of lists with scikit learn?
Any help is appreciated.
Upvotes: 0
Views: 274
Reputation: 5785
Try this, using Counter
>>> from collections import Counter
>>> lst=[['apple','peach','mango'],['apple','apple','mango']]
Output:
>>> {i:Counter(v) for i,v in enumerate(lst)}
{0: Counter({'apple': 1, 'peach': 1, 'mango': 1}),
1: Counter({'apple': 2, 'mango': 1})}
To get in the expected format(in list)
>>> [[i, obj, count] for i,v in enumerate(lst) for obj,count in Counter(v).items()]
[[0, 'apple', 1],
[0, 'peach', 1],
[0, 'mango', 1],
[1, 'apple', 2],
[1, 'mango', 1]]
Upvotes: 1