Reputation: 2150
I have a dictionary like this:
{'test2':{'hi':4,'bye':3}, 'religion.christian_20674': {'path': 1, 'religious': 1, 'hi':1}}
the value of this dictionary is itself a dictionary.
what my output should look like:
how can I do that efficiently?
I have read this post, which the shape of matrix is different from mine.
this one was closest to my case, but it had a set inside the dictionary not another dictionary.
the thing that is different in my question is that I want also conver the value of the inside dictionary as the values of the matrix.
I was thinking something like this:
doc_final =[[]]
for item in dic1:
for item2, value in dic1[item]:
doc_final[item][item2] = value
but it wasnt the correct way.
Thanks for your help :)
Upvotes: 2
Views: 94
Reputation: 13999
There does not seem to be any built in way in Pandas or Numpy to split up your rows like you want. Happily, you can do so with a single dictionary comprehension. The splitsubdicts
function shown below provides this dict comprehension, and the todf
function wraps up the whole conversion process:
def splitsubdicts(d):
return {('%s_%d' % (k0, i + 1)):{k1:v1} for k0,v0 in d.items() for i,(k1,v1) in enumerate(v0.items())}
def todf(d):
# .fillna(0) replaces the missing data with 0 (by default NaN is assigned to missing data)
return pd.DataFrame(splitsubdicts(splitsubdicts(d))).T.fillna(0)
You can use todf
like this:
d = {'Test2': {'hi':4, 'bye':3}, 'religion.christian_20674': {'path': 1, 'religious': 1, 'hi':1}}
df = todf(d)
print(df)
Output:
bye hi path religious
Test2_1_1 0.0 4.0 0.0 0.0
Test2_2_1 3.0 0.0 0.0 0.0
religion.christian_20674_1_1 0.0 0.0 1.0 0.0
religion.christian_20674_2_1 0.0 0.0 0.0 1.0
religion.christian_20674_3_1 0.0 1.0 0.0 0.0
If you actually want a Numpy array, you can easily convert the dataframe:
arr = df.values
print(arr)
Output:
[[0. 4. 0. 0.]
[3. 0. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]
[0. 1. 0. 0.]]
You can also convert the dataframe to a structured array instead, which lets you keep your row and column labels:
arr = df.to_records()
print(arr.dtype.names)
print(arr)
Output:
('index', 'bye', 'hi', 'path', 'religious')
[('Test2_1_1', 0., 4., 0., 0.)
('Test2_2_1', 3., 0., 0., 0.)
('religion.christian_20674_1_1', 0., 0., 1., 0.)
('religion.christian_20674_2_1', 0., 0., 0., 1.)
('religion.christian_20674_3_1', 0., 1., 0., 0.)]
splitsubdicts
The nested dictionary comprehension used in splitsubdicts
might seem kind of confusing. Really it's just a shorthand for writing nested loops. You can expand the comprehension out in a couple of for
loops as so:
def splitsubdicts(d):
ret = {}
for k0,v0 in d.items():
for i,(k1,v1) in enumerate(v0.items()):
ret['{}_{}'.format(k0, i + 1)] = {k1: v1}
return ret
The values returned by this loop-based version of splitsubdicts
will be identical to those returned by the comprehension-based version above. The comprehension-based version might be slightly faster than the loop-based version, but in practical terms it's not the kind of thing anyone should worry about.
Upvotes: 1
Reputation: 567
Using the pandas library you can easily turn your dictionary into a matrix.
Code:
import pandas as pd
d = {'test2':{'hi':4,'bye':3}, 'religion.christian_20674': {'path': 1, 'religious': 1, 'hi':1}}
df = pd.DataFrame(d).T.fillna(0)
print(df)
Output:
bye hi path religious
test2 3.0 4.0 0.0 0.0
religion.christian_20674 0.0 1.0 1.0 1.0
Upvotes: 2