Reputation: 13

The number of differences between characters in a string in Python 3

Given a string, lets say "TATA__", I need to find the total number of differences between adjacent characters in that string. i.e. there is a difference between T and A, but not a difference between A and A, or _ and _.

My code more or less tells me this. But when a string such as "TTAA__" is given, it doesn't work as planned.

I need to take a character in that string, and check if the character next to it is not equal to the first character. If it is indeed not equal, I need to add 1 to a running count. If it is equal, nothing is added to the count.

This what I have so far:

def num_diffs(state):
    count = 0            
    for char in state:
        if char != state[char2]:
            count += 1
    char2 += 1
    return count

When I run it using num_diffs("TATA__") I get 4 as the response. When I run it with num_diffs("TTAA__") I also get 4. Whereas the answer should be 2.

If any of that makes sense at all, could anyone help in fixing it/pointing out where my error lies? I have a feeling is has to do with state[char2]. Sorry if this seems like a trivial problem, it's just that I'm totally new to the Python language.

Upvotes: 0

Answers (4)

Martin Evans

Reputation: 46779

You might want to investigate Python's groupby function which helps with this kind of analysis.

from itertools import groupby

def num_diffs(seq):
    return len(list(groupby(seq))) - 1

for test in ["TATA__",  "TTAA__"]:
    print(test, num_diffs(test))

This would display:

TATA__ 4
TTAA__ 2

The groupby() function works by grouping identical entries together. It returns a key and a group, the key being the matching single entry, and the group being a list of the matching entries. So each time it returns, it is telling you there is a difference.

Upvotes: 1

Ilja Everilä

Reputation: 52937

import operator

def num_diffs(state):
    return sum(map(operator.ne, state, state[1:]))

To open this up a bit, it maps !=, operator.ne, over state and state beginning at the 2nd character. The map function accepts multible iterables as arguments and passes elements from those one by one as positional arguments to given function, until one of the iterables is exhausted (state[1:] in this case will stop first).

The map results in an iterable of boolean values, but since bool in python inherits from int you can treat it as such in some contexts. Here we are interested in the True values, because they represent the points where the adjacent characters differed. Calling sum over that mapping is an obvious next step.

Apart from the string slicing the whole thing runs using iterators in python3. It is possible to use iterators over the string state too, if one wants to avoid slicing huge strings:

import operator
from itertools import islice

def num_diffs(state):
    return sum(map(operator.ne,
                   state,
                   islice(state, 1, len(state))))

Upvotes: 2

Bolo

Reputation: 11690

Trying to make as little modifications to your original code as possible:

def num_diffs(state):
    count = 0            
    for char2 in range(1, len(state)):
        if state[char2 - 1] != state[char2]:
            count += 1      
    return count

One of the problems with your original code was that the char2 variable was not initialized within the body of the function, so it was impossible to predict the function's behaviour.

However, working with indices is not the most Pythonic way and it is error prone (see comments for a mistake that I made). You may want rewrite the function in such a way that it does one loop over a pair of strings, a pair of characters at a time:

def num_diffs(state):
    count = 0
    for char1, char2 in zip(state[:-1], state[1:]):
        if char1 != char2:
            count += 1
    return count

Finally, that very logic can be written much more succinctly — see @Ilja's answer.

Upvotes: -1

khelwood

Reputation: 59166

There are a couple of ways you might do this.
First, you could iterate through the string using an index, and compare each character with the character at the previous index.
Second, you could keep track of the previous character in a separate variable. The second seems closer to your attempt.

def num_diffs(s):
    count = 0
    prev = None
    for ch in s:
        if prev is not None and prev!=ch:
            count += 1
        prev = ch
    return count

prev is the character from the previous loop iteration. You assign it to ch (the current character) at the end of each iteration so it will be available in the next.

Upvotes: 1

The number of differences between characters in a string in Python 3

Answers (4)

Related Questions