Reputation: 83
response[0]={'@type': 'g:Vertex',
'@value': {'id': 'account-2199023262994',
'label': 'Account',
'properties': {'account_number': [{'@type': 'g:VertexProperty',
'@value': {'id': {'@type': 'g:Int32',
'@value': 544016139},
'value': '0002-1990-2326-2994',
'label': 'account_number'}}],
'last_name': [{'@type': 'g:VertexProperty',
'@value': {'id': {'@type': 'g:Int32',
'@value': -1616372909},
'value': 'Law-Yone',
'label': 'last_name'}}],
'first_name': [{'@type': 'g:VertexProperty',
'@value': {'id': {'@type': 'g:Int32',
'@value': -451458550},
'value': 'Eric',
'label': 'first_name'}}]}}}
I have a nested dictionary within the list which is a part of another nested dictionary. This whole above code snippet is itself an element of list response
.
I tried to flatten this using:
pd.json_normalize(response, meta = ['@type', ['@value', 'id'], ['@value', 'label'], ['@value', 'properties']])
I am expecting to flatten the internal lists of dictionaries also.
Can anyone help me out here? Do I need to iterate through the dataframe or are there any methods available directly?
EDIT: Expected Output as below
Upvotes: 2
Views: 232
Reputation: 71451
You can use a recursive generator function to get the full flattened result, row by row, and then use collections.defaultdict
to group the rows on the account ID. From there, you can grab the desired keys to build your final pd.DataFrame
:
import pandas as pd, collections
data = {'@type': 'g:Vertex', '@value': {'id': 'account-2199023262994', 'label': 'Account', 'properties': {'account_number': [{'@type': 'g:VertexProperty', '@value': {'id': {'@type': 'g:Int32', '@value': 544016139}, 'value': '0002-1990-2326-2994', 'label': 'account_number'}}], 'last_name': [{'@type': 'g:VertexProperty', '@value': {'id': {'@type': 'g:Int32', '@value': -1616372909}, 'value': 'Law-Yone', 'label': 'last_name'}}], 'first_name': [{'@type': 'g:VertexProperty', '@value': {'id': {'@type': 'g:Int32', '@value': -451458550}, 'value': 'Eric', 'label': 'first_name'}}]}}}
def flatten(d, c = [], p = []):
t, f = [], []
for a, b in d.items():
(t if not isinstance(b, (dict, list)) else f).append((p+[a], b))
if not f:
yield {'.'.join(a):b for a, b in (c+t)}
else:
for a, b in f:
if isinstance(b, dict):
yield from flatten(b, c=c+t, p = a)
else:
for i in b:
yield from flatten(i, c=c+t, p = a)
d = collections.defaultdict(dict)
for i in flatten(data):
d[i['@value.id']].update(i)
kv = [['@type', '@type'], ['@value.id', '@value.id'], ['@value.label', '@value.label'], ['@[email protected].@value', 'account_number.id'], ['@[email protected]', 'account_number.value'], ['@[email protected].@value', 'last_name.id'], ['@[email protected]', 'last_name.value'], ['@[email protected].@value', 'first_name.id'], ['@[email protected]', 'first_name.value']]
df = pd.DataFrame([{j:b[k] for k, j in kv} for b in d.values()])
Output:
@type @value.id @value.label ... last_name.value first_name.id first_name.value
0 g:Vertex account-2199023262994 Account ... Law-Yone -451458550 Eric
[1 rows x 9 columns]
Upvotes: 1