qwrty
qwrty

Reputation: 331

How to get rid of duplicate entries in a comma separated string

I have a comma separated string, how do i remove duplicate entries in the string in a pythonic way.

For example the string "a,a,b" should be changed to "a,b".

Upvotes: 6

Views: 8324

Answers (5)

Abhishek Sengupta
Abhishek Sengupta

Reputation: 3301

Hey just use this Java 8 Syntax:

 String words = "hello,hii,hii,bye,hii,word,World";
        words = Arrays.stream(words.split(",")).distinct().collect(Collectors.joining(","));

Output:

words: hello,hii,bye,word,World

Upvotes: 0

alecxe
alecxe

Reputation: 474081

If the order is important, you can use OrderedDict:

>>> from collections import OrderedDict
>>> s = "a,a,b"
>>> ",".join(OrderedDict.fromkeys(s.split(',')))
'a,b'

Note that this will also handle duplicates that are not next to each other:

>>> s = "a,b,a,a,a,b"
>>> ",".join(OrderedDict.fromkeys(s.split(',')))
'a,b'

Upvotes: 7

Phil
Phil

Reputation: 887

This should do the trick:

list(set(['a','a','b']))

Upvotes: 0

Andrew Jaffe
Andrew Jaffe

Reputation: 27097

You actually haven't specified what you want well enough. As everyone has pointed out, does order matter? Do you want to remove all duplicates, or only strings of the same one?

If order doesn't matter, all of the set solutions are fine. If it does, there are itertools recipes for these cases:

from itertools import ifilterfalse, imap, groupby
from operator import itemgetter

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in ifilterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

def unique_justseen(iterable, key=None):
    "List unique elements, preserving order. Remember only the element just seen."
    # unique_justseen('AAAABBBCCDAABBB') --> A B C D A B
    # unique_justseen('ABBCcAD', str.lower) --> A B C A D
    return imap(next, imap(itemgetter(1), groupby(iterable, key)))

You can apply either of these to 'a,a,b'.split(','):

In [6]: ','.join(set('a,a,b'.split(',')))
Out[6]: 'a,b'

In [7]: ','.join(unique_justseen('a,a,b'.split(',')))
Out[7]: 'a,b'

In [8]: ','.join(unique_everseen('a,a,b'.split(',')))
Out[8]: 'a,b'

or, for a case where they are different:

In [9]: ','.join(set('a,a,b,a'.split(',')))
Out[9]: 'a,b'

In [10]: ','.join(unique_everseen('a,a,b,a'.split(',')))
Out[10]: 'a,b'

In [11]: ','.join(unique_justseen('a,a,b,a'.split(',')))
Out[11]: 'a,b,a'

Upvotes: 0

Konrad Rudolph
Konrad Rudolph

Reputation: 545923

Is the order of elements important? If not, the easiest way is to create a set:

result = ','.join(set(text.split(',')))

But as I said, that won’t preserve the order of the original string:

>>> text = 'b,a,b'
>>> ','.join(set(text.split(',')))
'a,b'

Upvotes: 15

Related Questions