iNoob
iNoob

Reputation: 1395

How can I group items in Python by the value of a single field?

I've got a set of data that contains, a username, email address, and location.

Although the username will be different some of the usernames will have the same email address as the owner.

Now im trying to work out some logic to be able to group all the usersnames with the same email address. All of the usernames with the same email address will be sent an email containing all usersnames.

Usersnames with different email addresses will get an individual email.

I went down the following route as far as logic goes but it doesnt work as well as id like (it doesnt add all users needed in same_email). Any advise on the best approach would be appreciated.

emails = []
same_email = []
not_same = []
for data in user_list:
    email = data[2]
    if email not in emails:
        emails.append(email)
    elif email in emails:
        same_email.append(data)
    for email_d in same_email:
        if email not in email_d[2]:
            not_same.append((data,))

Upvotes: 0

Views: 55

Answers (2)

Gautham Kumaran
Gautham Kumaran

Reputation: 441

In most cases using the functions provided by the language, is better than coming with your own implementation

defaultdict initialises the elements of a dict with the provided type. list in your case

groupby() can be used to group any iteratable based on a key value

sample implementation below.

import itertools,collections

data = [('a','some_data','[email protected]'),
    ('aa','some_data','[email protected]'),
    ('b','some_data','[email protected]')]

email_group = collections.defaultdict(list)
for k,v in itertools.groupby(data, key=lambda x: x[2]):
    email_group[k] = list(v)

output:

defaultdict(list,
        {'[email protected]': [('a', 'some_data', '[email protected]'),
          ('aa', 'some_data', '[email protected]')],
         '[email protected]': [('b', 'some_data', '[email protected]')]})

Upvotes: 0

Charles Duffy
Charles Duffy

Reputation: 295403

Use a defaultdict to easily bucket your items, then iterate through that dictionary when it's complete.

A set, by contrast, is the best structure to use for an unordered group of items -- such as the set of email addresses with only one account. Here, we're using one set (regular_email_addresses) for all email addresses tied only to a single account, and then another set per email address to store the account datums tied to same.

import collections

# whenever a lookup is done in this dict, and no entry already exists, call set() to
# create the new, default entry.
userdata_by_email = collections.defaultdict(set)

for data in user_list:
    email = data[2]
    # add always works here, because defaultdict is creating a new set
    userdata_by_email[email].add(tuple(data)) # data can't be a list here, so cast to tuple

regular_email_addresses = set()
for email, userdata_set in userdata_by_email.iteritems():
    if len(userdata_set) == 1:
        regular_email_addresses.add(email)
    else:
        send_special_email(email, userdata_set)

send_bulk_email(regular_email_addresses)

Fill in your own implementation of send_special_email (to send an email to a single address with multiple usernames) and send_bulk_email (to send an email to all the addresses with only one username), and you're done.

Upvotes: 1

Related Questions