Terry Bu
Terry Bu

Reputation: 899

Pandas python how to transform nested JSON object into rows and columns

I have a json dictionary of thousands of json objects called 'businesses' that contains the following nested JSON in an array. Each 'business' has a 'unique-request-id' that comes back as well. Sometimes, one request could come back with multiple busineses.

{'principals': [{'addresses': [{'city': 'DENVILLE', 'state': 'NJ', 'metadata': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 0}, 'zip': '07834', 'address': '87 JILLOW AVE'}], 'titles': ['SECRETARY'], 'names': [{'suffix': None, 'firstName': 'JAMES', 'middleName': None, 'lastName': 'VU', 'salutation': None, 'pids': [], 'metadata': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 0}}, {'suffix': None, 'firstName': 'LAURA', 'middleName': 'A', 'lastName': 'VU', 'salutation': None, 'pids': [], 'metadata': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 0}}], 'pids': ['1000320219031584'], 'lastSeenDate': '2008-01-01T00:00:00.000Z', 'firstSeenDate': None}, {'addresses': [{'city': 'LAKE HAVASU CITY', 'state': 'AZ', 'metadata': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 0}, 'zip': '86403', 'address': '1887 WILLOW AVE'}], 'titles': ['MANAGER'], 'names': [{'suffix': None, 'firstName': 'JASON', 'middleName': None, 'lastName': 'ROBERTS', 'salutation': None, 'pids': [], 'metadata': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 0}}], 'pids': ['1000320147115755'], 'lastSeenDate': '1999-07-28T00:00:00.000Z', 'firstSeenDate': None}], 'tradeNames': ['DORI LAURA DVM'], 'addresses': [{'city': 'HAVASU CITY', 'state': 'NY', 'metadata': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 0}, 'zip': '16403', 'address': '299 PASEO DEL SOL'}], 'searchType': 'Name+Address', 'contacts': [], 'phones': ['19284532022'], 'registrationDate': '1999-07-29T00:00:00.000Z', 'websites': ['WWW.DORIVET.COM'], 'eid': '39281818', 'sources': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 5}, 'dataSource': 'iData', 'names': ['DORI VETERINARY CENTER INC', 'DORI-ROBERTS INC'], 'registrationAddress': {'city': 'PHOENIX', 'state': 'AZ', 'metadata': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 0}, 'zip': '85007', 'address': '1200 W WASHINGTON'}, 'duns': ['042512348'], 'relatedBusinesses': [], 'industryNames': ['VETERINARIAN, ANIMAL SPECIALTIES'], 'officeAddress': None, 'ein': ['77711133333']}

I would like the ultimate CSV from DataFrame to look like this ... with all the flattened columns to the right.

RequestId BusinessName                 BusinessAddress .... 
ABCD      DORI VETERINARY CENTER INC.  87 JILLOW AVENUE
ABCD      SECONDARY VETERINARY CENTER INC.   3 MAIN ST.
XYZV.     xxxbusiness                        20 AUTO ST.

Is this possible using Pandas and DataFrames?

Upvotes: 0

Views: 870

Answers (1)

NYC Coder
NYC Coder

Reputation: 7594

You'd have to do something like below using json_normalize, the problem with this is, you'll have to do it separately for each record path - one for principals, one for tradeNames and so on.

with open('test-so.json') as f:
    data = json.load(f)
df1 = json_normalize(data, 
                 record_path=['principals', 'addresses'], 
                 meta=[['principals', 'lastSeenDate']],
                 errors='ignore')

               city state    zip          address metadata.sources metadata.lastSeen metadata.firstSeen  metadata.sourceCount     principals.lastSeenDate
0          DENVILLE    NJ  07834    87 JILLOW AVE               []              None               None                     0  2008-01-01T00: 00: 00.000Z
1  LAKE HAVASU CITY    AZ  86403  1887 WILLOW AVE               []              None               None                     0  1999-07-28T00: 00: 00.000Z

Or you could use flatten_json directly, it flattens the entire json into one row, check the documentation here - Flatten JSON

Upvotes: 1

Related Questions