Mike Girard
Mike Girard

Reputation: 464

Best method for repeated searches on large list of dicts

Let's say I have a function that returns 1000 records from a postgres database as a list of dicts that looks like this (but much bigger):

[ {"thing_id" : 245, "thing_title" : "Thing title", "thing_url": "thing-url"},
  {"thing_id" : 459, "thing_title" : "Thing title II", "thing_url": "thing-url/2"}]

I have a process that requires around 600 individual searches on this list for the right dict based on a given unique thing_id. Rather than iterating through the entire list each time, wouldn't it be more efficient to create a dict of dicts, making the thing_id for each dict a key, like this:

{245 : {"thing_id" : 245, "thing_title" : "Thing title", "thing_url": "thing-url"},
 459 : {"thing_id" : 459, "thing_title" : "Thing title II", "thing_url": "thing-url/2"}}

If so, is there a preferred way of doing this? Obviously I could build the dict by iterating through the list. But was wondering if there are any built in methods for this. If not, what is the preferred way of going about this? Also, is there a better way of repeatedly retrieving data from the same large set of records than what I am proposing here, please let me know.

UPDATE: Ended up going with dict comprehension:

data = {row["thing_id"]: row for row in rows}

where rows is the result from my db query with a psycopg2.extras.DictCursor. Building the dict is fast enough and the lookups are very fast.

Upvotes: 0

Views: 94

Answers (2)

nicolas.leblanc
nicolas.leblanc

Reputation: 640

a = [ {"thing_id" : 245, "thing_title" : "Thing title", "thing_url": "thing-url"}, {"thing_id" : 459, "thing_title" : "Thing title II", "thing_url": "thing-url/2"}]
c = [b.values()[1] for b in a]

Upvotes: 0

Viktor Kerkez
Viktor Kerkez

Reputation: 46636

You can use the pandas DataFrame structure for multi column indexing:

>>> result = [
        {"thing_id" : 245, "thing_title" : "Thing title", "thing_url": "thing-url"},
        {"thing_id" : 459, "thing_title" : "Thing title II", "thing_url": "thing-url/2"}
    ]
>>> df = pd.DataFrame(result)
>>> df.set_index('thing_id', inplace=True)
>>> df.sort_index(inplace=True)
>>> df
             thing_title    thing_url
thing_id                             
245          Thing title    thing-url
459       Thing title II  thing-url/2
>>> df.loc[459, 'thing_title']
'Thing title II'

Upvotes: 1

Related Questions