MYK
MYK

Reputation: 2997

How do you efficiently search across a group of lists in Python?

In Python I have a group of lists that track information about some users:

user_id = [1,2,3,4,5]
user_name = ['bob', 'alice', 'jerry', 'lisa', 'tom']
user_email = ['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']
...

where the i'th element in each list correspond to each other.

I want to get user info "x" given info "y". In most cases I'd use a dictionary for this for the constant lookup time, but I don't want to build and maintain dozens of dictionaries.

If I maintain a dictionary for every pair of lists shown above I'd have

name:email
email:name
name:id
id:name
email:id
id:email

which already starts getting unmanageable - and grows very quickly with the number of attributes.

I could have everything map to user_id, and then have only 2n dictionaries, but happy to learn of a more appropriate data structure for this use case.

To illustrate how the code is currently implemented:

def get_email_by_user_id(user_id):
   return [email for email, uid in zip(user_email, user_id) if uid == user_id][0]

As you can imagine, very slow :P

Upvotes: 3

Views: 242

Answers (3)

MYK
MYK

Reputation: 2997

In the end I took the only option that gave the the performance I needed

I decided the contents of user_id are the canonical identifier.

I then created the following dictionaries:

def make_dictionaries(user_id, other_lists=[('user_name', user_name), ('user_email', user_email)]):
   to_id_dictionary = {}
   from_id_dictionary = {}

   for list_name, list_content in other_lists:
      from_id_dictionary[list_name] = {uid:cont for uid,cont in zip(user_id, list_content)}
      to_id_dictionary[list_name] = {cont:uid for uid,cont in zip(user_id, list_content)}

   return to_id_dictionary, from_id_dictionary 

I can then do:

def get_email_by_user_name(user_name):

   uid = to_id_dictionary['user_name'][user_name] # Get UID from name
   return from_id_dictionary[user_email][uid] # Get email from UID

Upvotes: 1

Keith
Keith

Reputation: 43024

Since the data is related they can be organized into a single list of tuples of the related columns.

DATA = [
    (1, 'bob', '[email protected]'),
    (2, 'alice', '[email protected]'),
    (3, 'jerry', '[email protected]'),
    (4, 'lisa', '[email protected]'),
    (5, 'tom', '[email protected]'),
]

Then, a general function can be made that considers only the column you are interested in.

def find_user(user_id=None, user_name=None, user_email=None):
    """Find first user matching given criteria.

    A None value means "don't care".

    Returns tuple of (id, name, email) if found, otherwise None.
    """
    # Collect desired criteria into mapping of record index to desired index value.
    criteria_cols = {i: c for (i, c) in enumerate((user_id, user_name, user_email)) if c is not None}
    for rec in DATA:
        if all(rec[idx] == criteria for (idx, criteria) in criteria_cols.items()):
            return rec  # return early if found.

This function considers any non-None value, and returns the matching record. If no records match, fall through and return the default None value.

print(find_user(user_id=1))
print(find_user(user_id=2))
print(find_user(user_name="alice"))
print(find_user(user_email="[email protected]"))
print(find_user(user_id=3, user_email="[email protected]"))
print(find_user(user_id=2, user_email="[email protected]"))
print(find_user(user_id=3, user_name="jerry"))

Results in

(1, 'bob', '[email protected]')
(2, 'alice', '[email protected]')
(2, 'alice', '[email protected]')
(3, 'jerry', '[email protected]')
(3, 'jerry', '[email protected]')
None
(3, 'jerry', '[email protected]')

Upvotes: 0

user3234810
user3234810

Reputation: 482

# Dict for holding your data
data = dict()
    
# Put all your stuff into data 
for id, name, email in zip( user_id, user_name , user_email):
    data[ id ] = { "id": id , "username" : name , "email" : email }

# Function for lookup up by key and value 
def lookup_info( key_name , lookup_value , data ):
    '''
    Takes a key name, a lookup value and a dictionary of data.

    Returns the dictionary item
    '''
    for k,v in data.items():
        
        if v[ key_name ] == lookup_value:
            return( data[ k ] ) 

Upvotes: 0

Related Questions