DJay
DJay

Reputation: 1282

Converting sklearn CountVectorizer matrix to list of dictionaries

I have created a matrix using CountVectorizer which looks like

[[1, 2, 1....],
 [0, 4, 0,...],
 [0, 0, 7....]]

where each column maps to a feature name

['sweet', 'pretty', 'bad'....]

What I want to do

To convert the rows of matrix to a list of dictionaries of the form

[{'sweet': 1, 'pretty': 2, 'bad': 1  ..} , {'sweet': 0, 'pretty': 4, 'bad': 0  ..} , {'sweet': 0, 'pretty': 0, 'bad': 7  ..}]

which is basically doing what inverse_transform function of DictVectorizer would do but since I have not created the matrix from the dictionary I don't think I can use that because I get this error

'DictVectorizer' object has no attribute 'feature_names_'

How do I achieve this? Does NumPy provide a built in function to convert the array into a list of dictionaries where I could map each column to a given key?

Upvotes: 1

Views: 4992

Answers (1)

sgDysregulation
sgDysregulation

Reputation: 4417

The function you're looking for is get_feature_names
not sure if there is a builtin way to achieve what you want but it's esily achievable with a simple map

from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer() 

#`data` is an array of strings

tdata = cv.fit_transform(data) 

ft = cv.get_feature_names() 

#create a dictionary with feature names as keys and row elements as values

result = list(map(lambda row:dict(zip(ft,row)),tdata.toarray()))

Edit: memory saving solution

import pandas as pd

df = pd.SparseDataFrame(tdata, columns=ft)

Upvotes: 1

Related Questions