Reputation: 899
I have a json dictionary of thousands of json objects called 'businesses' that contains the following nested JSON in an array. Each 'business' has a 'unique-request-id' that comes back as well. Sometimes, one request could come back with multiple busineses.
{'principals': [{'addresses': [{'city': 'DENVILLE', 'state': 'NJ', 'metadata': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 0}, 'zip': '07834', 'address': '87 JILLOW AVE'}], 'titles': ['SECRETARY'], 'names': [{'suffix': None, 'firstName': 'JAMES', 'middleName': None, 'lastName': 'VU', 'salutation': None, 'pids': [], 'metadata': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 0}}, {'suffix': None, 'firstName': 'LAURA', 'middleName': 'A', 'lastName': 'VU', 'salutation': None, 'pids': [], 'metadata': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 0}}], 'pids': ['1000320219031584'], 'lastSeenDate': '2008-01-01T00:00:00.000Z', 'firstSeenDate': None}, {'addresses': [{'city': 'LAKE HAVASU CITY', 'state': 'AZ', 'metadata': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 0}, 'zip': '86403', 'address': '1887 WILLOW AVE'}], 'titles': ['MANAGER'], 'names': [{'suffix': None, 'firstName': 'JASON', 'middleName': None, 'lastName': 'ROBERTS', 'salutation': None, 'pids': [], 'metadata': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 0}}], 'pids': ['1000320147115755'], 'lastSeenDate': '1999-07-28T00:00:00.000Z', 'firstSeenDate': None}], 'tradeNames': ['DORI LAURA DVM'], 'addresses': [{'city': 'HAVASU CITY', 'state': 'NY', 'metadata': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 0}, 'zip': '16403', 'address': '299 PASEO DEL SOL'}], 'searchType': 'Name+Address', 'contacts': [], 'phones': ['19284532022'], 'registrationDate': '1999-07-29T00:00:00.000Z', 'websites': ['WWW.DORIVET.COM'], 'eid': '39281818', 'sources': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 5}, 'dataSource': 'iData', 'names': ['DORI VETERINARY CENTER INC', 'DORI-ROBERTS INC'], 'registrationAddress': {'city': 'PHOENIX', 'state': 'AZ', 'metadata': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 0}, 'zip': '85007', 'address': '1200 W WASHINGTON'}, 'duns': ['042512348'], 'relatedBusinesses': [], 'industryNames': ['VETERINARIAN, ANIMAL SPECIALTIES'], 'officeAddress': None, 'ein': ['77711133333']}
I would like the ultimate CSV from DataFrame to look like this ... with all the flattened columns to the right.
RequestId BusinessName BusinessAddress ....
ABCD DORI VETERINARY CENTER INC. 87 JILLOW AVENUE
ABCD SECONDARY VETERINARY CENTER INC. 3 MAIN ST.
XYZV. xxxbusiness 20 AUTO ST.
Is this possible using Pandas and DataFrames?
Upvotes: 0
Views: 870
Reputation: 7594
You'd have to do something like below using json_normalize
, the problem with this is, you'll have to do it separately for each record path - one for principals, one for tradeNames and so on.
with open('test-so.json') as f:
data = json.load(f)
df1 = json_normalize(data,
record_path=['principals', 'addresses'],
meta=[['principals', 'lastSeenDate']],
errors='ignore')
city state zip address metadata.sources metadata.lastSeen metadata.firstSeen metadata.sourceCount principals.lastSeenDate
0 DENVILLE NJ 07834 87 JILLOW AVE [] None None 0 2008-01-01T00: 00: 00.000Z
1 LAKE HAVASU CITY AZ 86403 1887 WILLOW AVE [] None None 0 1999-07-28T00: 00: 00.000Z
Or you could use flatten_json
directly, it flattens the entire json into one row, check the documentation here - Flatten JSON
Upvotes: 1