Iterate through Nested, mixed dictionary

Question

I have been using CrunchBase API, and the output provided is as the following example (actual example here):

 output = {'name':'StackOverflow',
       'competitors':[{   'competitor':'bing',
                          'link':'bing.com'},
                      {   'competitor':'google',
                          'link':'google.com'}],
       'acquisition': {'acquired_day': 16,
                       'acquired_month': 12,
                       'acquired_year': 2013,
                       'acquiring_company': {'name': 'Viggle',
                                             'permalink': 'viggle'}}}

(this is just an example).

The point is, in the output dict there are several values that can be unicode/int, lists or dictionaries. There values can hold lists, dict or unicode as well.

How could I iterate through the dict? I tried itertools.product but it only seems to work when the structure of the dict is uniform. My goal is to turn this output JSON file into a csv.

Bach · Accepted Answer

I am not completely sure what you wish to achieve exactly, but if your output is actually one line in the requested CSV, you may need to "flatten" the nested dictionary first.

Assuming your structure is a dict whose values are either "simple" (strings, floats, etc.), or dicts, or lists (nested, unlimited depth), and assume there's some character (for example, "_") which does not appear in any of the keys, you may flatten the dict using the following recursive function (or any other similar one):

def _flatten_items(items, sep, prefix):
  _items = []
  for key, value in items:
    _prefix = "{}{}".format(prefix, key)
    if isinstance(value, list):
      _items.extend(_flatten_items(list(enumerate(value)), sep=sep,
                    prefix=_prefix+sep))
    elif isinstance(value, dict):
      _items.extend(_flatten_items(value.items(), sep=sep,
                    prefix=_prefix+sep))
    else:
      _items.append((_prefix, value))
  return _items


def flatten_dict(d, sep='_'):
  return dict(_flatten_items(d.items(), sep=sep, prefix=""))

As an example, in your output this should give:

output = {'name':'StackOverflow',
       'competitors':[{   'competitor':'bing',
                          'link':'bing.com'},
                      {   'competitor':'google',
                          'link':'google.com'}],
       'acquisition': {'acquired_day': 16,
                       'acquired_month': 12,
                       'acquired_year': 2013,
                       'acquiring_company': {'name': 'Viggle',
                                             'permalink': 'viggle'}}}

print flatten_dict(output)
# {'acquisition_acquired_year': 2013, 'acquisition_acquiring_company_name': 'Viggle', 'name': 'StackOverflow', 'acquisition_acquiring_company_permalink': 'viggle', 'competitors_0_competitor': 'bing', 'acquisition_acquired_month': 12, 'competitors_1_link': 'google.com', 'acquisition_acquired_day': 16, 'competitors_1_competitor': 'google', 'competitors_0_link': 'bing.com'}

Then you may use csv DictWriter (or similar) to write the output data to csv.

Iterate through Nested, mixed dictionary

Answers (1)

Related Questions