Reputation: 1079
I am trying to convert a nested json array to a pandas data frame.
The data looks something like this in list format:
[{u'analysis': {u'active': u'Y',
u'dpv_cmra': u'N',
u'dpv_footnotes': u'AAN1',
u'dpv_match_code': u'D',
u'dpv_vacant': u'N',
u'footnotes': u'H#'},
u'candidate_index': 0,
u'components':
{u'city_name': u'City',
u'delivery_point': u'Variable',
u'delivery_point_check_digit': u'8',
u'plus4_code': u'Variable',
u'primary_number': u'Variable',
u'state_abbreviation': u'Variable',
u'street_name': u'Variable',
u'street_predirection': u'Variable',
u'street_suffix': u'Variable',
u'zipcode': u'Variable'},
u'delivery_line_1': u'Variable',
u'delivery_point_barcode': u'Variable',
u'input_id': u'Variable',
u'input_index': Variable,
u'last_line': u'Variable',
u'metadata':
{u'building_default_indicator': u'Variable',
u'carrier_route': u'Variable',
u'congressional_district': u'Variable',
u'county_fips': u'Variable',
u'county_name': u'Variable',
u'dst': True,
u'zip_type': u'Variable'}}],
Any suggests how I can convert this to a data frame and take care of empty values? I've tried using try / except to handle the missing values, but I my data frame is then made up of tuples.
Thank You
Upvotes: 3
Views: 1165
Reputation: 109526
There is a json_normalize function inside pd.io.json.
d = {u'analysis': {u'active': u'Y', u'dpv_cmra': u'N', u'dpv_footnotes': u'AAN1', u'dpv_match_code': u'D', u'dpv_vacant': u'N', u'footnotes': u'H#'}, u'candidate_index': 0, u'components': {u'city_name': u'City', u'delivery_point': u'Variable', u'delivery_point_check_digit': u'8', u'plus4_code': u'Variable', u'primary_number': u'Variable', u'state_abbreviation': u'Variable', u'street_name': u'Variable', u'street_predirection': u'Variable', u'street_suffix': u'Variable', u'zipcode': u'Variable'}, u'delivery_line_1': u'Variable', u'delivery_point_barcode': u'Variable', u'input_id': u'Variable', u'input_index': u'Variable', u'last_line': u'Variable', u'metadata': {u'building_default_indicator': u'Variable', u'carrier_route': u'Variable', u'congressional_district': u'Variable', u'county_fips': u'Variable', u'county_name': u'Variable', u'dst': True, u'zip_type': u'Variable'}}
>>> pd.io.json.json_normalize(d)
analysis.active analysis.dpv_cmra analysis.dpv_footnotes analysis.dpv_match_code analysis.dpv_vacant analysis.footnotes candidate_index components.city_name components.delivery_point components.delivery_point_check_digit ... \
0 Y N AAN1 D N H# 0 City Variable 8 ...
input_id input_index last_line metadata.building_default_indicator metadata.carrier_route metadata.congressional_district metadata.county_fips metadata.county_name metadata.dst metadata.zip_type
0 Variable Variable Variable Variable Variable Variable Variable Variable True Variable
[1 rows x 29 columns]
Upvotes: 4