Sql_Pete_Belfast
Sql_Pete_Belfast

Reputation: 600

KeyError when returning all values of 1 key based on set of 2 keys

I am trying to query a mongo DB and return all values for one JSON key based on a set of two keys from the DB. My attempt below, which returns

KeyError: ('test1', 'test2')


def distros(
        mongo_uri: str,
        mongo_source_db: str,
        mongo_source_collection: str) -> Dict[str, List[str]]:
    distros_per_param= OrderedDict()
    
    mongo = pymongo.MongoClient(mongo_uri)
    db = mongo.get_database(mongo_source_db)
    col = db.get_collection(mongo_source_collection)
    query = {}
    total = col.count_documents(query)
    cursor = col.find(query)
    for doc in tqdm(cursor, total=total, desc='distributions'):
        param_one = doc['param_one']
        param_two = doc['param_two']
        if param_type not in distributions_per_type:
            distributions_per_param[param_one] = set()
        if param_value not in distributions_per_type:
            distributions_per_param[param_two] = set()
        distro_value = str(doc['distro']).strip().lower()
        if distro_value:
            distributions_per_param[param_type, param_value].add(distro_value)
            print(doc)
        index = {k: sorted(v) for k, v in distributions_per_param.items()}
    return index

The data is a json list of docs queried from mongo

sample_data = [{'param_one': 'x1', 'param_one': 'y2', 'distro': 'test1'},
               {'param_one': 'x1', 'param_one': 'y2', 'distro': 'test2'},
               {'param_one': 'x2', 'param_one': 'y1', 'distro': 'test3'},
               {'param_one': 'x2', 'param_one': 'y1', 'distro': 'test4'}]

I need the resulting data to look like

result = [{'x1, y2': ['test1','test2']},
          {'x2, y1': ['test3','test4']}]
     

Upvotes: 0

Views: 71

Answers (1)

M. Perier--Dulhoste
M. Perier--Dulhoste

Reputation: 1039

I assume you did a typo in the sample_data and the key for y{1/2} is param_two instead of param_one (the keys must be unique in a dictionary and the param_one is already used for x{1/2})


Once you created your sample_data variable you can group the values according param_one / param_two by doing the following:

from collections import defaultdict

def groupby_params(sample_data):
    results = defaultdict(list)
    
    for s in sample_data:
        results[f"{s['param_one']}, {s['param_two']}"].append(s["distro"])
    
    return [{k: v} for k, v in results.items()]

This will give you the expected output:

sample_data = [{'param_one': 'x1', 'param_two': 'y2', 'distro': 'test1'},
               {'param_one': 'x1', 'param_two': 'y2', 'distro': 'test2'},
               {'param_one': 'x2', 'param_two': 'y1', 'distro': 'test3'},
               {'param_one': 'x2', 'param_two': 'y1', 'distro': 'test4'}]

groupby_params(sample_data)

[{'x1, y2': ['test1', 'test2']}, {'x2, y1': ['test3', 'test4']}]

Upvotes: 1

Related Questions