Reputation: 600
I am trying to query a mongo DB and return all values for one JSON key based on a set of two keys from the DB. My attempt below, which returns
KeyError: ('test1', 'test2')
def distros(
mongo_uri: str,
mongo_source_db: str,
mongo_source_collection: str) -> Dict[str, List[str]]:
distros_per_param= OrderedDict()
mongo = pymongo.MongoClient(mongo_uri)
db = mongo.get_database(mongo_source_db)
col = db.get_collection(mongo_source_collection)
query = {}
total = col.count_documents(query)
cursor = col.find(query)
for doc in tqdm(cursor, total=total, desc='distributions'):
param_one = doc['param_one']
param_two = doc['param_two']
if param_type not in distributions_per_type:
distributions_per_param[param_one] = set()
if param_value not in distributions_per_type:
distributions_per_param[param_two] = set()
distro_value = str(doc['distro']).strip().lower()
if distro_value:
distributions_per_param[param_type, param_value].add(distro_value)
print(doc)
index = {k: sorted(v) for k, v in distributions_per_param.items()}
return index
The data is a json list of docs queried from mongo
sample_data = [{'param_one': 'x1', 'param_one': 'y2', 'distro': 'test1'},
{'param_one': 'x1', 'param_one': 'y2', 'distro': 'test2'},
{'param_one': 'x2', 'param_one': 'y1', 'distro': 'test3'},
{'param_one': 'x2', 'param_one': 'y1', 'distro': 'test4'}]
I need the resulting data to look like
result = [{'x1, y2': ['test1','test2']},
{'x2, y1': ['test3','test4']}]
Upvotes: 0
Views: 71
Reputation: 1039
I assume you did a typo in the sample_data
and the key for y{1/2}
is param_two
instead of param_one
(the keys must be unique in a dictionary and the param_one
is already used for x{1/2}
)
Once you created your sample_data
variable you can group the values according param_one
/ param_two
by doing the following:
from collections import defaultdict
def groupby_params(sample_data):
results = defaultdict(list)
for s in sample_data:
results[f"{s['param_one']}, {s['param_two']}"].append(s["distro"])
return [{k: v} for k, v in results.items()]
This will give you the expected output:
sample_data = [{'param_one': 'x1', 'param_two': 'y2', 'distro': 'test1'},
{'param_one': 'x1', 'param_two': 'y2', 'distro': 'test2'},
{'param_one': 'x2', 'param_two': 'y1', 'distro': 'test3'},
{'param_one': 'x2', 'param_two': 'y1', 'distro': 'test4'}]
groupby_params(sample_data)
[{'x1, y2': ['test1', 'test2']}, {'x2, y1': ['test3', 'test4']}]
Upvotes: 1