Reputation: 849

Hierarchical grouping in key value pair with python

I have a list like this:

data = [
{'date':'2017-01-02', 'model': 'iphone5', 'feature':'feature1'},
{'date':'2017-01-02', 'model': 'iphone7', 'feature':'feature2'},
{'date':'2017-01-03', 'model': 'iphone6', 'feature':'feature2'},
{'date':'2017-01-03', 'model': 'iphone6', 'feature':'feature2'},
{'date':'2017-01-03', 'model': 'iphone7', 'feature':'feature3'},
{'date':'2017-01-10', 'model': 'iphone7', 'feature':'feature2'},
{'date':'2017-01-10', 'model': 'iphone7', 'feature':'feature1'},
]

I want to achieve this:

[
   {
      '2017-01-02':[{'iphone5':['feature1']}, {'iphone7':['feature2']}]
   },
   {
      '2017-01-03': [{'iphone6':['feature2']}, {'iphone7':['feature3']}]
   },
   {
      '2017-01-10':[{'iphone7':['feature2', 'feature1']}]
   }
]

I need an efficient way, since it could be much data.

I was trying this:

data = sorted(data, key=itemgetter('date'))
date = itertools.groupby(data, key=itemgetter('date'))

But I'm getting nothing for the value of the 'date' key.

Later I will iterate over this structure for building an HTML.

Upvotes: 2

Answers (3)

Matthias Fripp

Reputation: 18625

You can do this pretty efficiently and cleanly using defaultdict. Unfortunately it's a pretty advanced use and it gets hard to read.

from collections import defaultdict
from pprint import pprint

# create a dictionary whose elements are automatically dictionaries of sets
result_dict = defaultdict(lambda: defaultdict(set))

# Construct a dictionary with one key for each date and another dict ('model_dict') 
# as the value.
# The model_dict has one key for each model and a set of features as the value.
for d in data:
    result_dict[d["date"]][d["model"]].add(d["feature"])

# more explicit version:
# for d in data:
#     model_dict = result_dict[d["date"]]   # created automatically if needed
#     feature_set = model_dict[d["model"]]  # created automatically if needed
#     feature_set.add(d["feature"])

# convert the result_dict into the required form
result_list = [
    {   
        date: [
            {phone: list(feature_set)} 
                for phone, feature_set in sorted(model_dict.items())
        ]
    } for date, model_dict in sorted(result_dict.items())
]

pprint(result_list)
# [{'2017-01-02': [{'iphone5': ['feature1']}, {'iphone7': ['feature2']}]},
#  {'2017-01-03': [{'iphone6': ['feature2']}, {'iphone7': ['feature3']}]},
#  {'2017-01-10': [{'iphone7': ['feature2', 'feature1']}]}]

Upvotes: 3

McGrady

Reputation: 11477

You can try this, here is my way, td is a dict to store { iphone : index } to check if the new item exist in the list of dict:

from itertools import groupby
from operator import itemgetter

r = []
for i in groupby(sorted(data, key=itemgetter('date')), key=itemgetter('date')):
    td, tl = {}, []
    for j in i[1]:
        if j["model"] not in td:
            tl.append({j["model"]: [j["feature"]]})
            td[j["model"]] = len(tl) - 1
        elif j["feature"] not in tl[td[j["model"]]][j["model"]]:
            tl[td[j["model"]]][j["model"]].append(j["feature"])
    r.append({i[0]: tl})

Result:

[
  {'2017-01-02': [{'iphone5': ['feature1']}, {'iphone7': ['feature2']}]},
  {'2017-01-03': [{'iphone6': ['feature2']}, {'iphone7': ['feature3']}]},
  {'2017-01-10': [{'iphone7': ['feature2', 'feature1']}]}
]

As matter of fact, I think the data structure can be simplified, maybe you don't need so many nesting.

Upvotes: 1

minji

Reputation: 512

total_result = list()
result = dict()
inner_value = dict()

for d in data:
    if d["date"] not in result:
        if result:
            total_result.append(result)
        result = dict()
        result[d["date"]] = set()
        inner_value = dict()

    if d["model"] not in inner_value:
        inner_value[d["model"]] = set()

    inner_value[d["model"]].add(d["feature"])
    tmp_v = [{key: list(inner_value[key])} for key in inner_value]
    result[d["date"]] = tmp_v

total_result.append(result)

total_result

[{'2017-01-02': [{'iphone7': ['feature2']}, {'iphone5': ['feature1']}]},
 {'2017-01-03': [{'iphone6': ['feature2']}, {'iphone7': ['feature3']}]},
 {'2017-01-10': [{'iphone7': ['feature2', 'feature1']}]}]

Upvotes: 0

Hierarchical grouping in key value pair with python

Answers (3)

Related Questions