unique_beast
unique_beast

Reputation: 1480

Parsing key:value pairs in a list

I have inherited a Mongo structure with key:value pairs within an array. I need to extract the collected and spent values in the below tags, however I don't see an easy way to do this using the $regex commands in the Mongo Query documentation.

    {
    "_id" : "94204a81-9540-4ba8-bb93-fc5475c278dc"
    "tags" : ["collected:172", "donuts_used:1", "spent:150"]
    }

The ideal output of extracting these values is to dump them into a format below when querying them using pymongo. I really don't know how best to return only the values I need. Please advise.

94204a81-9540-4ba8-bb93-fc5475c278dc, 172, 150

Upvotes: 0

Views: 802

Answers (3)

Shaba Abhiram
Shaba Abhiram

Reputation: 31

Here is one way to do it, if all you had was that sample JSON object.

Please pay attention to the note about the ordering of tags etc. It is probably best to revise your "schema" so that you can more easily query, collect and aggregate your "tags" as you call them.

import re

# Returns csv string of _id, collected, used
def parse(obj):
    _id         = obj["_id"]
    # This is terribly brittle since the insertion of any other type of tag
    # between 'c' and 's' will cause these indices to be messed up. 
    # It is probably much better to directly query these, or store them as individual
    # entities in your mongo "schema".
    collected   = re.sub(r"collected:(\d+)", r"\1", obj["tags"][0])
    spent       = re.sub(r"spent:(\d+)", r"\1", obj["tags"][2])
    return ", ".join([_id, collected, spent])

# Some sample object
parse_me = {
    "_id" : "94204a81-9540-4ba8-bb93-fc5475c278dc"
    "tags" : ["collected:172", "donuts_used:1", "spent:150"]
}
print parse(parse_me)

Upvotes: 0

chishaku
chishaku

Reputation: 4643

print d['_id'], ' '.join([ x.replace('collected:', '').replace('spent:', '')\
    for x in d['tags'] if 'collected' in x or 'spent' in x ] )
>>>
94204a81-9540-4ba8-bb93-fc5475c278dc 172 150

Upvotes: 1

B.Mr.W.
B.Mr.W.

Reputation: 19648

In case you are having a hard time writing mongo query(your elements inside the list are actually string instead of key value which requires parsing), here is a solution in plain Python that might be helpful.

>>> import pymongo
>>> from pymongo import MongoClient
>>> client = MongoClient('localhost', 27017)
>>> db = client['test']
>>> collection = db['stackoverflow']
>>> collection.find_one()
{u'_id': u'94204a81-9540-4ba8-bb93-fc5475c278dc', u'tags': [u'collected:172', u'donuts_used:1', u'spent:150']}
>>> record = collection.find_one()
>>> print record['_id'], record['tags'][0].split(':')[-1], record['tags'][2].split(':')[-1]
94204a81-9540-4ba8-bb93-fc5475c278dc 172 150

Instead of using find_one(), you can retrieve all the record by using appropriate function here and looop through every record. I am not sure how consistent your data might be, so I hard coded using the first and third element in the list... you can want to tweak that part and have a try except at record level.

Upvotes: 1

Related Questions