Vectorizing a list of lists with sklearn learn?

Question

I am trying to use CountVectorizer from sklearn with a list of lists.

Lst=[['apple','peach','mango'],['apple','apple','mango']]

I would like the output to return the count of words in each list. For example:

0:apple:1
0:peach:1
0:mango:1

1:apple:2
1:peach:0
1:mango:1

or any other format.

I found this post that is similar to mine, but the answer wasn't complete.

How should I vectorize the following list of lists with scikit learn?

Any help is appreciated.

shaik moeed · Accepted Answer

Try this, using Counter

>>> from collections import Counter
>>> lst=[['apple','peach','mango'],['apple','apple','mango']]

Output:

>>> {i:Counter(v) for i,v in enumerate(lst)}
{0: Counter({'apple': 1, 'peach': 1, 'mango': 1}),
 1: Counter({'apple': 2, 'mango': 1})}

To get in the expected format(in list)

>>> [[i, obj, count] for i,v in enumerate(lst) for obj,count in Counter(v).items()]
[[0, 'apple', 1],
 [0, 'peach', 1],
 [0, 'mango', 1],
 [1, 'apple', 2],
 [1, 'mango', 1]]

Vectorizing a list of lists with sklearn learn?

Answers (1)

Related Questions