Reputation: 11275
I have 8 lists (jan, feb, mar, apr, may, jun, jul, aug) each of which contain names in list format, i.e.
['John Smith', 'Cat Stevens', 'Andrew Alexander', 'El Gordo Baba', 'Louis le Roy']
etc.
How do I compare these lists in order, and see when a name appeared (i.e. subscribed) and when a name disappeared (i.e. unsubscribed).
So, say John Smith didn't appear until February, I want to have this information. Lets say he unsubscribed in July, I want this information too (this is FAR more important than the former).
Upvotes: 0
Views: 304
Reputation: 48310
data = {
'jan': ['John Smith', 'Cat Stevens', 'Andrew Alexander', 'El Gordo Baba'],
'feb': ['Louis le Roy', 'John Smith'],
'mar': ['Cat Stevens', 'Louis le Roy']
}
from itertools import izip
keys = 'jan feb mar'.split()
for m1,m2 in izip(keys,keys[1:]):
a = set(data[m1])
b = set(data[m2])
print m1, '\n\tsubscribed:', ','.join(b-a), '\n\tquit:', ','.join(a - b )
result:
jan
subscribed: Louis le Roy
quit: Andrew Alexander,Cat Stevens,El Gordo Baba
feb
subscribed: Cat Stevens
quit: John Smith
Upvotes: 1
Reputation: 85458
Don't use lists, use a set
instead.
You could find who (un)subscribed between jan
and feb
simply using set difference:
subs = feb - jan
unsubs = jan - feb
That being said, you would be better off following Daenyth's suggestion. Put these in a database, add a joined
and left
date field and you'll have finer granularity than just months and you won't need to stored duplicated data.
Upvotes: 6
Reputation: 208405
Here is a quick example:
jan,feb,mar,apr,may,jun,jul,aug = [1],[1,2],[1,2,3],[1,2,3,4],[2,3,4],[3,4],[4],[]
months = [set(m) for m in [jan,feb,mar,apr,may,jun,jul,aug]]
changes = [(list(b-a), list(a-b)) for a, b in zip(months, months[1:])]
>>> changes
[([2], []), ([3], []), ([4], []), ([], [1]), ([], [2]), ([], [3]), ([], [4])]
Each element in changes
is a transition from one month to the next, where the first item in the tuple is a list of all that were added, and the second item in the tuple is a list of all that left.
Upvotes: 0
Reputation: 7419
As a starter:
from collections import defaultdict
dd = dict(jan=(0,jan), feb=(1, feb), ...)
appearances = defaultdict(list)
for k, (i, li) in dd.items():
for name in li:
appearances[name].append((i,k))
for name in appearances.keys():
months = [ (name, i) for i, name in sorted(appearances[name]) ]
print name, months
You get for each name this sorted list of pairs (month, index)
the name appears. index
is the index of the month. Now you can check for gaps, for a minimal index and for a maximal index.
Upvotes: 0
Reputation: 96258
data = {
'jan': ['John Smith', 'Cat Stevens', 'Andrew Alexander', 'El Gordo Baba'],
'feb': ['Louis le Roy', 'John Smith'],
'mar': ['Cat Stevens', 'Louis le Roy']
}
subs = {}
unsubs = {}
for mon in data:
for name in data[mon]:
if name not in subs:
subs[name] = mon
else:
unsubs[name] = mon
>>> subs
{'Andrew Alexander': 'jan', 'Louis le Roy': 'mar', 'John Smith': 'jan', 'El Gordo Baba': 'jan', 'Cat Stevens': 'jan'}
>>> unsubs
{'Louis le Roy': 'feb', 'John Smith': 'feb', 'Cat Stevens': 'mar'}
Upvotes: 0