Andrew Neeld
Andrew Neeld

Reputation: 113

Python JSON to Pandas Dataframe

I am working with a JSON response that is formatted like a many-nested dictionary below:

{u'addresses': [],
 u'application_ids': [20855193],
 u'applications': [{u'answers': [{u'answer': u'Indeed ',
                                  u'question': u'How did you hear?'}],
                    u'applied_at': u'2015-10-29T22:19:04.925Z',
                    u'candidate_id': 9999999,
                    u'credited_to': None,
                    u'current_stage': {u'id': 9999999,
                                       u'name': u'Application Review'},
                    u'id': 9999999,
                    u'jobs': [{u'id': 9999999,u'name': u'ENGINEER'}],
                    u'last_activity_at': u'2015-10-29T22:19:04.767Z',
                    u'prospect': False,
                    u'rejected_at': None,
                    u'rejection_details': None,
                    u'rejection_reason': None,
                    u'source': {u'id': 7, u'public_name': u'Indeed'},
                    u'status': u'active'}],
 u'attachments': [{u'filename': u'Jason_Bourne.pdf',
                   u'type': u'resume',
                   u'url': u'https://resumeURL'}],
 u'company': None,
 u'coordinator': {u'employee_id': None,
                  u'id': 9999999,
                  u'name': u'Batman_Robin'},
 u'email_addresses': [{u'type': u'personal',
                       u'value': u'[email protected]'}],
 u'first_name': u'Jason',
 u'id': 9999999,
 u'last_activity': u'2015-10-29T22:19:04.767Z',
 u'last_name': u'Bourne',
 u'website_addresses': []}

I am trying to flatten the JSON into a table and have found the following example on the pandas documentation:

http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.io.json.json_normalize.html

For some reason, any data directly under the 'applications' header returns as one character per row. For example, if I call:

timeapplied = json_normalize(data,['applications', ['applied_at']])

I get:

 0
0   2
1   0
2   1
3   5
4   -
5   1
6   0
7   -
8   2
9   9
10  T
11  2
12  2
13  :
14  1
15  9
16  :
17  0
18  4
19  .
20  9
21  2
22  5
23  Z

Is there any way around this so I can use the normalize function?

Thanks!

Upvotes: 2

Views: 869

Answers (1)

Analytical Monk
Analytical Monk

Reputation: 379

Your call:

timeapplied = json_normalize(data,['applications', ['applied_at']])

A call to json_normalize consists of the parameters shown below,

pandas.io.json.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None)

You are passing ['applications', ['applied_at']] as the record_path. Apparently, this means that the data provided under data['applications]['applied_at'] is used as an array of records. In this case, the string is used as a list of characters. Hence, you obtain rows corresponding to each character.

To simply obtain all the data under the 'applications' header as a dataframe, use:

applied = json_normalize(data, 'applications')

To obtain applied_at as an individual column, then use:

applied_at = applied.applied_at

or

applied_at = applied['applied_at']

Upvotes: 1

Related Questions