Steven Matthews
Steven Matthews

Reputation: 11275

Comparing Lists

I have 8 lists (jan, feb, mar, apr, may, jun, jul, aug) each of which contain names in list format, i.e.

['John Smith', 'Cat Stevens', 'Andrew Alexander', 'El Gordo Baba', 'Louis le Roy']

etc.

How do I compare these lists in order, and see when a name appeared (i.e. subscribed) and when a name disappeared (i.e. unsubscribed).

So, say John Smith didn't appear until February, I want to have this information. Lets say he unsubscribed in July, I want this information too (this is FAR more important than the former).

Upvotes: 0

Views: 304

Answers (5)

fabmilo
fabmilo

Reputation: 48310

data = {
 'jan': ['John Smith', 'Cat Stevens', 'Andrew Alexander', 'El Gordo Baba'],
 'feb': ['Louis le Roy', 'John Smith'],
 'mar': ['Cat Stevens', 'Louis le Roy']
}

from itertools import izip

keys = 'jan feb mar'.split()
for m1,m2 in izip(keys,keys[1:]):
    a = set(data[m1])
    b = set(data[m2])
    print m1, '\n\tsubscribed:', ','.join(b-a), '\n\tquit:', ','.join(a - b )

result:

jan 
    subscribed: Louis le Roy 
    quit: Andrew Alexander,Cat Stevens,El Gordo Baba
feb 
    subscribed: Cat Stevens 
    quit: John Smith

Upvotes: 1

NullUserException
NullUserException

Reputation: 85458

Don't use lists, use a set instead.

You could find who (un)subscribed between jan and feb simply using set difference:

subs = feb - jan
unsubs = jan - feb

That being said, you would be better off following Daenyth's suggestion. Put these in a database, add a joined and left date field and you'll have finer granularity than just months and you won't need to stored duplicated data.

Upvotes: 6

Andrew Clark
Andrew Clark

Reputation: 208405

Here is a quick example:

jan,feb,mar,apr,may,jun,jul,aug = [1],[1,2],[1,2,3],[1,2,3,4],[2,3,4],[3,4],[4],[]
months = [set(m) for m in [jan,feb,mar,apr,may,jun,jul,aug]]
changes = [(list(b-a), list(a-b)) for a, b in zip(months, months[1:])]

>>> changes
[([2], []), ([3], []), ([4], []), ([], [1]), ([], [2]), ([], [3]), ([], [4])]

Each element in changes is a transition from one month to the next, where the first item in the tuple is a list of all that were added, and the second item in the tuple is a list of all that left.

Upvotes: 0

rocksportrocker
rocksportrocker

Reputation: 7419

As a starter:

from collections import defaultdict
dd = dict(jan=(0,jan), feb=(1, feb), ...)

appearances = defaultdict(list)

for k, (i, li) in dd.items():
   for name in li:
       appearances[name].append((i,k))

for name in appearances.keys():
    months = [ (name, i) for i, name in sorted(appearances[name]) ]
    print name, months

You get for each name this sorted list of pairs (month, index) the name appears. index is the index of the month. Now you can check for gaps, for a minimal index and for a maximal index.

Upvotes: 0

Karoly Horvath
Karoly Horvath

Reputation: 96258

data = {
 'jan': ['John Smith', 'Cat Stevens', 'Andrew Alexander', 'El Gordo Baba'],
 'feb': ['Louis le Roy', 'John Smith'],
 'mar': ['Cat Stevens', 'Louis le Roy']
}

subs = {}
unsubs = {}
for mon in data:
    for name in data[mon]:
        if name not in subs:
            subs[name] = mon
        else:
            unsubs[name] = mon
>>> subs
{'Andrew Alexander': 'jan', 'Louis le Roy': 'mar', 'John Smith': 'jan', 'El Gordo Baba': 'jan', 'Cat Stevens': 'jan'}
>>> unsubs
{'Louis le Roy': 'feb', 'John Smith': 'feb', 'Cat Stevens': 'mar'}

Upvotes: 0

Related Questions