Reputation: 1282
I have created a matrix using CountVectorizer
which looks like
[[1, 2, 1....],
[0, 4, 0,...],
[0, 0, 7....]]
where each column maps to a feature name
['sweet', 'pretty', 'bad'....]
What I want to do
To convert the rows of matrix to a list of dictionaries of the form
[{'sweet': 1, 'pretty': 2, 'bad': 1 ..} , {'sweet': 0, 'pretty': 4, 'bad': 0 ..} , {'sweet': 0, 'pretty': 0, 'bad': 7 ..}]
which is basically doing what inverse_transform
function of DictVectorizer
would do but since I have not created the matrix from the dictionary I don't think I can use that because I get this error
'DictVectorizer' object has no attribute 'feature_names_'
How do I achieve this? Does NumPy provide a built in function to convert the array into a list of dictionaries where I could map each column to a given key?
Upvotes: 1
Views: 4992
Reputation: 4417
The function you're looking for is get_feature_names
not sure if there is a builtin way to achieve what you want but it's esily achievable with a simple map
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer()
#`data` is an array of strings
tdata = cv.fit_transform(data)
ft = cv.get_feature_names()
#create a dictionary with feature names as keys and row elements as values
result = list(map(lambda row:dict(zip(ft,row)),tdata.toarray()))
Edit: memory saving solution
import pandas as pd
df = pd.SparseDataFrame(tdata, columns=ft)
Upvotes: 1