Compare two sets of strings and then return whole strings that are different in Python 3.4

Question

I'm writing a small API listening program, and I'm trying to figure out when something new has been published. I've figured out most of it, but I'm having a problem on the last step -- where I want to print out something new. I can compare the two lists of items as sets and get the set of letters that's in the right answer, but I can't seem to get the actual strings to print.

Here's the code I wrote to compare the two lists (both new_revised_stuff and old_revised_stuff are lists of strings, like "Bob likes to eat breakfast at http://bobsburgers.com" with a few dozen items per list).

new_stuff = set(new_revised_stuff) - set(old_revised_stuff).intersection(new_revised_stuff)

Which returns:

set('b','o','l'...)

I can get rid of the 'set' notation by writing:

list(new_stuff)

But that doesn't really help. I'd really like it to print out "Bob likes..." if that's a new line.

I've also tried:

new_stuff = []
for a in new_revised_stuff:
    for b in old_revised_stuff:
        if a != b:
            ''.join(a)
            new_stuff.append(a)

Which results in an actual stack overflow, so it's obviously bad code.

abarnert · Accepted Answer

If you want to join any iterable of single characters into a string, you do that with ''.join(new_stuff). For example:

>>> new_stuff = ['b','o','l']
>>> ''.join(new_stuff)
'bol'

However, there are two problems here, that are inherent in your design:

Sets only hold unique elements. So, if your string diffs are "Hello, Bob", there's only going to be one o and one l in the set of diffs.
Sets are arbitrarily ordered. So, if your string diffs are "Bob likes", converting that into a set and then back to a string will get you something like 'k iboeBls'.

If either of those is a problem (and I suspect they are), you need to rethink your algorithm. You can solve the second one by using an OrderedSet (there's a recipe for that in the collections docs), but the first one is going to be more of a problem.

So, how could you do this?

Well, you don't really need new_revised_stuff to be a set; if you iterate over the characters and keep only the ones that aren't in old_revised_stuff, as long as old_revised_stuff is a set, that's just as efficient as intersecting two sets.

But making old_revised_stuff a set will also eliminate any duplicates there, which I don't think you want. What you really want is a "multiset". In Python, the best way to represent that is usually a Counter.

So, I think what you want (maybe) is something like this:

old_string = ' to eat breakfast at http://bobsburgers.com'
new_string = 'Bob likes to eat breakfast at http://bobsburgers.com'
old_chars = collections.Counter(old_string)
new_chars = []
for ch in new_string:
    if old_chars[ch]:
        old_chars[ch] -= 1
    else:
        new_chars.append(ch)
new_string = ''.join(new_chars)

Compare two sets of strings and then return whole strings that are different in Python 3.4

Answers (1)

Related Questions