Reputation: 113
I am working with a JSON response that is formatted like a many-nested dictionary below:
{u'addresses': [],
u'application_ids': [20855193],
u'applications': [{u'answers': [{u'answer': u'Indeed ',
u'question': u'How did you hear?'}],
u'applied_at': u'2015-10-29T22:19:04.925Z',
u'candidate_id': 9999999,
u'credited_to': None,
u'current_stage': {u'id': 9999999,
u'name': u'Application Review'},
u'id': 9999999,
u'jobs': [{u'id': 9999999,u'name': u'ENGINEER'}],
u'last_activity_at': u'2015-10-29T22:19:04.767Z',
u'prospect': False,
u'rejected_at': None,
u'rejection_details': None,
u'rejection_reason': None,
u'source': {u'id': 7, u'public_name': u'Indeed'},
u'status': u'active'}],
u'attachments': [{u'filename': u'Jason_Bourne.pdf',
u'type': u'resume',
u'url': u'https://resumeURL'}],
u'company': None,
u'coordinator': {u'employee_id': None,
u'id': 9999999,
u'name': u'Batman_Robin'},
u'email_addresses': [{u'type': u'personal',
u'value': u'[email protected]'}],
u'first_name': u'Jason',
u'id': 9999999,
u'last_activity': u'2015-10-29T22:19:04.767Z',
u'last_name': u'Bourne',
u'website_addresses': []}
I am trying to flatten the JSON into a table and have found the following example on the pandas documentation:
http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.io.json.json_normalize.html
For some reason, any data directly under the 'applications' header returns as one character per row. For example, if I call:
timeapplied = json_normalize(data,['applications', ['applied_at']])
I get:
0
0 2
1 0
2 1
3 5
4 -
5 1
6 0
7 -
8 2
9 9
10 T
11 2
12 2
13 :
14 1
15 9
16 :
17 0
18 4
19 .
20 9
21 2
22 5
23 Z
Is there any way around this so I can use the normalize function?
Thanks!
Upvotes: 2
Views: 869
Reputation: 379
Your call:
timeapplied = json_normalize(data,['applications', ['applied_at']])
A call to json_normalize consists of the parameters shown below,
pandas.io.json.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None)
You are passing ['applications', ['applied_at']]
as the record_path. Apparently, this means that the data provided under data['applications]['applied_at']
is used as an array of records. In this case, the string is used as a list of characters. Hence, you obtain rows corresponding to each character.
To simply obtain all the data under the 'applications' header as a dataframe, use:
applied = json_normalize(data, 'applications')
To obtain applied_at
as an individual column, then use:
applied_at = applied.applied_at
or
applied_at = applied['applied_at']
Upvotes: 1