Reputation: 1282

Optimise counting changes between elements in a list

I have some working code to track the 'changes' between elements of a list - such that any two consecutive elements that are not identical constitute a change. The code is probably just as easy to understand here.

testlist = ['red','red','blue','red','red','black','yellow','black','yellow','blue']

The first red to red would trigger no change, however the next red to blue would. I also want to tally up the changes to each color.

# Set Tally counters to 0 and a unique key
red = 0
blue = 0
black = 0
yellow = 0
key = 40006

for i in range(len(testlist)-1):
    if (testlist[i] == (testlist[i+1])):
        print("No Change")
    else:
        print("Change to: " + str(testlist[i+1]))
        if testlist[i+1] == 'red':
            red = red + 1
        elif testlist[i+1] == 'blue':
            blue = blue + 1
        elif testlist[i+1] == 'black':
            black = black + 1
        elif testlist[i+1] == 'yellow':
            yellow = yellow + 1
dictfordf = {'key':key, 'red':red,'blue':blue,'black':black,'yellow':yellow}

This works and outputs {'black': 2, 'blue': 2, 'key': 40006, 'red': 1, 'yellow': 2} correctly.

When the number of unique elements grows (only 4 unique colors in this example) to 10 the if/elif becomes very verbose.

My two questions are:

Is there a more concise way to accomplish this?
Is there a faster way to execute this task?

Upvotes: 0

Answers (3)

Andrej Kesely

Reputation: 195408

My take on the problem:

from collections import Counter

testlist = ['red','red','blue','red','red','black','yellow','black','yellow','blue']

def changes(data):
    last = data[0]
    for i in data:
        if last != i:
            yield i
        last = i

c = Counter(changes(testlist))
c['key'] = 40006
print(dict(c))

Output:

{'yellow': 2, 'red': 1, 'key': 40006, 'blue': 2, 'black': 2}

Upvotes: 2

Alex Reinking

Reputation: 19826

My take uses zip to walk through the list by pairs, and is fairly terse. Like the others, it uses Counter, which I agree is the right tool for the job.

from collections import Counter

testlist = ['red','red','blue','red','red','black','yellow','black','yellow','blue']

def count_changes(data):
    c = Counter()
    c['key'] = 40006
    for item1, item2 in zip(data, data[1:]):
        if item1 != item2:
            c[item2] += 1
    return c

print(count_changes(testlist))

Output:

Counter({'key': 40006, 'blue': 2, 'black': 2, 'yellow': 2, 'red': 1})

It's unclear what the correct behavior should be if "key" appears in the testlist, but it would be straightforward to modify this code to handle that.

Upvotes: 1

abarnert

Reputation: 365627

First, since your goal is to build a dict, just build the dict on the fly, instead of building a bunch of separate variables and then putting them in a dict at the end.

You can also use a Counter instead of a plain dict so you don't need to worry about checking whether the color is already there.

While we're at it, there's no need to call str on something that's already a string, and you've got a bunch of unnecessary parens all over the place.

So:

from collections import Counter
dictfordf = Counter()
dictfordf['key'] = 40006
for i in range(len(testlist)-1):
    if testlist[i] == testlist[i+1]:
        print("No Change")
    else:
        print("Change to: " + testlist[i+1])
        dictfordf[testlist[i+1]] += 1

It's a little hacky to store a value for 'key' that really isn't a count, so you might want to consider using a defaultdict, or setdefault on a normal dict, instead. But I don't think it's too bad.

Of course if 'key' could be one of the elements in testlist, this is going to increment the key. But then if that's possible, it's not clear what should happen in that case, so it's not clear how you'd want to fix it.

Meanwhile, you can make things a little concise by iterating over adjacent pairs. See the pairwise recipe in the itertools docs. But of course this adds the definition of pairwise to your code (or you can import it from a third-party lib like more-itertools or toolz).

So:

from collections import Counter
from itertools import tee

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

dictfordf = Counter()
dictfordf['key'] = 40006
for prev, current in pairwise(testlist):
    if prev == current:
        print("No Change")
    else:
        print("Change to: " + current)
        dictfordf[current] += 1

You can abstract things further by using either groupby, or the unique_justseen recipe from itertools. I think this will obscure rather than clarify where you print the outputs—but, assuming you understand the pairwise version, it's worth reading up on both of them, and trying to write both alternatives, at least as an exercise.

Upvotes: 4

Optimise counting changes between elements in a list

Answers (3)

Related Questions