Shibu
Shibu

Reputation: 1542

Generate data Frame from JSON

I am trying to generate a data frame from JSON.the json I am having is in below format

    {
  eventId: "9668383e-ec96-4d6a-b873-2312dd008e7b",
  eventType: "PlannedCustomerChoiceWasUpdated",
  publishedDate: "2016-05-31T18:52:29.219Z",
  payload: {
    plannedCustomerChoiceId: "e9301a6e-7ccf-4c89-bd05-19c1b9067a61"
  },
  _links: {
    self: {
      href: "http://gbp-router.gapinc.dev:8080/planning-service/_feeds/planning.planning-service.planned-customer-choice-events/entries/d2de62a6-1e0f-430a-bf3f-2df711f64beb"
    },
    source: {
      href: "http://gbp-router.gapinc.dev:8080/planning-service/planning/buy-plan/planned-customer-choices/e9301a6e-7ccf-4c89-bd05-19c1b9067a61"
    }
  }
}

I need the columns as a single records.

This is what i have did till now.

from pandas.io.json import json_normalize
a = { "eventId": "9668383e-ec96-4d6a-b873-2312dd008e7b", "eventType": "PlannedCustomerChoiceWasUpdated", "publishedDate": "2016-05-31T18:52:29.219Z", "payload": { "plannedCustomerChoiceId": "e9301a6e-7ccf-4c89-bd05-19c1b9067a61" }, "_links": { "self": { "href": "http://gbp-router.gapinc.dev:8080/planning-service/_feeds/planning.planning-service.planned-customer-choice-events/entries/d2de62a6-1e0f-430a-bf3f-2df711f64beb" }, "source": { "href": "http://gbp-router.gapinc.dev:8080/planning-service/planning/buy-plan/planned-customer-choices/e9301a6e-7ccf-4c89-bd05-19c1b9067a61" } } }
b = json_normalize(a)
print b

I am unable to achieve in desired format. Can anyone help me out.

using b = pd.DataFrame(a) the data frame is in below format.

_

links  \
plannedCustomerChoiceId                                                NaN   
self                     {u'href': u'http://gbp-router.gapinc.dev:8080/...   
source                   {u'href': u'http://gbp-router.gapinc.dev:8080/...   

                                                      eventId  \
plannedCustomerChoiceId  9668383e-ec96-4d6a-b873-2312dd008e7b   
self                     9668383e-ec96-4d6a-b873-2312dd008e7b   
source                   9668383e-ec96-4d6a-b873-2312dd008e7b   

                                               eventType  \
plannedCustomerChoiceId  PlannedCustomerChoiceWasUpdated   
self                     PlannedCustomerChoiceWasUpdated   
source                   PlannedCustomerChoiceWasUpdated   

                                                      payload  \
plannedCustomerChoiceId  e9301a6e-7ccf-4c89-bd05-19c1b9067a61   
self                                                      NaN   
source                                                    NaN   

                                    publishedDate  
plannedCustomerChoiceId  2016-05-31T18:52:29.219Z  
self                     2016-05-31T18:52:29.219Z  
source                   2016-05-31T18:52:29.219Z 

what I an actually looking for is

9668383e-ec96-4d6a-b873-2312dd008e7b,PlannedCustomerChoiceWasUpdated,2016-05-31T18:52:29.219Z,e9301a6e-7ccf-4c89-bd05-19c1b9067a61,http://gbp-router.gapinc.dev:8080/planning-service/planning/buy-plan/planned-customer-choices/e9301a6e-7ccf-4c89-bd05-19c1b9067a61

Upvotes: 2

Views: 76

Answers (1)

jezrael
jezrael

Reputation: 862851

I think you can first change ordering of columns and then to_csv:

import pandas as pd

from pandas.io.json import json_normalize
a = { "eventId": "9668383e-ec96-4d6a-b873-2312dd008e7b", "eventType": "PlannedCustomerChoiceWasUpdated", "publishedDate": "2016-05-31T18:52:29.219Z", "payload": { "plannedCustomerChoiceId": "e9301a6e-7ccf-4c89-bd05-19c1b9067a61" }, "_links": { "self": { "href": "http://gbp-router.gapinc.dev:8080/planning-service/_feeds/planning.planning-service.planned-customer-choice-events/entries/d2de62a6-1e0f-430a-bf3f-2df711f64beb" }, "source": { "href": "http://gbp-router.gapinc.dev:8080/planning-service/planning/buy-plan/planned-customer-choices/e9301a6e-7ccf-4c89-bd05-19c1b9067a61" } } }
b = json_normalize(a)

b = b[['eventId','eventType','publishedDate','payload.plannedCustomerChoiceId','_links.source.href']]
#print (b)

print (b.to_csv(index=False, header=False))
9668383e-ec96-4d6a-b873-2312dd008e7b,PlannedCustomerChoiceWasUpdated,2016-05-31T18:52:29.219Z,e9301a6e-7ccf-4c89-bd05-19c1b9067a61,http://gbp-router.gapinc.dev:8080/planning-service/planning/buy-plan/planned-customer-choices/e9301a6e-7ccf-4c89-bd05-19c1b9067a61

If need change column names:

import pandas as pd

from pandas.io.json import json_normalize
a = { "eventId": "9668383e-ec96-4d6a-b873-2312dd008e7b", "eventType": "PlannedCustomerChoiceWasUpdated", "publishedDate": "2016-05-31T18:52:29.219Z", "payload": { "plannedCustomerChoiceId": "e9301a6e-7ccf-4c89-bd05-19c1b9067a61" }, "_links": { "self": { "href": "http://gbp-router.gapinc.dev:8080/planning-service/_feeds/planning.planning-service.planned-customer-choice-events/entries/d2de62a6-1e0f-430a-bf3f-2df711f64beb" }, "source": { "href": "http://gbp-router.gapinc.dev:8080/planning-service/planning/buy-plan/planned-customer-choices/e9301a6e-7ccf-4c89-bd05-19c1b9067a61" } } }
b = json_normalize(a)

b.columns = ['self','source','eventId','eventType','plannedCustomerChoiceId','publishedDate']
print (b)
                                                self  \
0  http://gbp-router.gapinc.dev:8080/planning-ser...   

                                              source  \
0  http://gbp-router.gapinc.dev:8080/planning-ser...   

                                eventId                        eventType  \
0  9668383e-ec96-4d6a-b873-2312dd008e7b  PlannedCustomerChoiceWasUpdated   

                plannedCustomerChoiceId             publishedDate  
0  e9301a6e-7ccf-4c89-bd05-19c1b9067a61  2016-05-31T18:52:29.219Z  

b = b[['eventId','eventType','publishedDate','plannedCustomerChoiceId','source']]
print (b)
                                eventId                        eventType  \
0  9668383e-ec96-4d6a-b873-2312dd008e7b  PlannedCustomerChoiceWasUpdated   

              publishedDate               plannedCustomerChoiceId  \
0  2016-05-31T18:52:29.219Z  e9301a6e-7ccf-4c89-bd05-19c1b9067a61   

                                              source  
0  http://gbp-router.gapinc.dev:8080/planning-ser...  

print (b.to_csv(index=False, header=False))
b873-2312dd008e7b,PlannedCustomerChoiceWasUpdated,2016-05-31T18:52:29.219Z,e9301a6e-7ccf-4c89-bd05-19c1b9067a61,http://gbp-router.gapinc.dev:8080/planning-service/planning/buy-plan/planned-customer-choices/e9301a6e-7ccf-4c89-bd05-19c1b9067a61

Upvotes: 1

Related Questions