Spencer
Spencer

Reputation: 2566

How to convert dictionary to matrix in python?

I have a dictionary like this:

{device1 : (news1, news2, ...), device2 : (news 2, news 4, ...)...}

How to convert them into a 2-D 0-1 matrix in python? Looks like this:

         news1 news2 news3 news4
device1    1     1     0      0
device2    0     1     0      1
device3    1     0     0      1

Upvotes: 8

Views: 29398

Answers (3)

mgrogger
mgrogger

Reputation: 304

Adding on to this since I think previous answers assume you have your data structured differently and don't directly address your issue.

Assuming I'm understanding your data structure correctly and the names of the indices in your matrix don't really matter:

from sklearn.feature_extraction import DictVectorizer

dict = {'device1':['news1', 'news2'],
        'device2':['news2', 'news4'],
        'device3':['news1', 'news4']}

restructured = []

for key in dict:
    data_dict = {}
    for news in dict[key]:
        data_dict[news] = 1
    data_dict['news3'] = 0
    restructured.append(data_dict)

#restructured should now look like
'''
[{'news1':1, 'news2':1, 'news3':0},
 {'news2':1, 'news4':1, 'news3':0},
 {'news1':1, 'news4':1, 'news3':0}]
'''

dictvectorizer = DictVectorizer(sparse=False)
features = dictvectorizer.fit_transform(restructured)

print(features)

#output
'''
[[1, 1, 0, 0],
 [0, 1, 1, 0],
 [1, 0, 1, 0]]
'''
print(dictvectorizer.get_feature_names())
#output
'''
['news1', 'news2', 'news4', 'news3']
'''

Upvotes: 4

tolgabuyuktanir
tolgabuyuktanir

Reputation: 687

Here is another choice to convert a dictionary to a matrix:

# Load library
from sklearn.feature_extraction import DictVectorizer

# Our dictionary of data
data_dict = [{'Red': 2, 'Blue': 4},
             {'Red': 4, 'Blue': 3},
             {'Red': 1, 'Yellow': 2},
             {'Red': 2, 'Yellow': 2}]
# Create DictVectorizer object
dictvectorizer = DictVectorizer(sparse=False)

# Convert dictionary into feature matrix
features = dictvectorizer.fit_transform(data_dict)
print(features)
#output
'''
[[4. 2. 0.]
 [3. 4. 0.]
 [0. 1. 2.]
 [0. 2. 2.]]
'''
print(dictvectorizer.get_feature_names())
#output
'''
['Blue', 'Red', 'Yellow']
'''

Upvotes: 3

Robbie
Robbie

Reputation: 4882

Here is some code that will create a matrix (or 2D array) using the numpy package. Note that we have to use a list of the names in order because dictionaries do not necessarily store the keys/values in the order they are entered.

import numpy as np

dataDict = {'device1':(1,1,0,1), 'device2':(0,1,0,1), 'device3':(1,0,0,1)}
orderedNames = ['device1','device2','device3']

dataMatrix = np.array([dataDict[i] for i in orderedNames])

print dataMatrix

The output is:

[[1 1 0 1]
 [0 1 0 1]
 [1 0 0 1]]

Upvotes: 6

Related Questions