Reputation: 331
I have a comma separated string, how do i remove duplicate entries in the string in a pythonic way.
For example the string "a,a,b"
should be changed to "a,b"
.
Upvotes: 6
Views: 8324
Reputation: 3301
Hey just use this Java 8 Syntax:
String words = "hello,hii,hii,bye,hii,word,World";
words = Arrays.stream(words.split(",")).distinct().collect(Collectors.joining(","));
Output:
words: hello,hii,bye,word,World
Upvotes: 0
Reputation: 474081
If the order is important, you can use OrderedDict
:
>>> from collections import OrderedDict
>>> s = "a,a,b"
>>> ",".join(OrderedDict.fromkeys(s.split(',')))
'a,b'
Note that this will also handle duplicates that are not next to each other:
>>> s = "a,b,a,a,a,b"
>>> ",".join(OrderedDict.fromkeys(s.split(',')))
'a,b'
Upvotes: 7
Reputation: 27097
You actually haven't specified what you want well enough. As everyone has pointed out, does order matter? Do you want to remove all duplicates, or only strings of the same one?
If order doesn't matter, all of the set
solutions are fine. If it does, there are itertools recipes for these cases:
from itertools import ifilterfalse, imap, groupby
from operator import itemgetter
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
def unique_justseen(iterable, key=None):
"List unique elements, preserving order. Remember only the element just seen."
# unique_justseen('AAAABBBCCDAABBB') --> A B C D A B
# unique_justseen('ABBCcAD', str.lower) --> A B C A D
return imap(next, imap(itemgetter(1), groupby(iterable, key)))
You can apply either of these to 'a,a,b'.split(',')
:
In [6]: ','.join(set('a,a,b'.split(',')))
Out[6]: 'a,b'
In [7]: ','.join(unique_justseen('a,a,b'.split(',')))
Out[7]: 'a,b'
In [8]: ','.join(unique_everseen('a,a,b'.split(',')))
Out[8]: 'a,b'
or, for a case where they are different:
In [9]: ','.join(set('a,a,b,a'.split(',')))
Out[9]: 'a,b'
In [10]: ','.join(unique_everseen('a,a,b,a'.split(',')))
Out[10]: 'a,b'
In [11]: ','.join(unique_justseen('a,a,b,a'.split(',')))
Out[11]: 'a,b,a'
Upvotes: 0
Reputation: 545923
Is the order of elements important? If not, the easiest way is to create a set
:
result = ','.join(set(text.split(',')))
But as I said, that won’t preserve the order of the original string:
>>> text = 'b,a,b'
>>> ','.join(set(text.split(',')))
'a,b'
Upvotes: 15