Reputation: 8554
I have a table like this:
col1 col2
ben US-US-Uk
Man Uk-NL-DE
bee CA-CO-MX-MX
how can I unique the values in col 2, which means have a table like this?
col1 col2
ben US-Uk
Man Uk-NL-DE
bee CA-CO-MX
I have tried this :
a.cc.str.split('-').unique()
but get the following error:
TypeError: unhashable type: 'list'
Does anybody know how to do this?
Upvotes: 1
Views: 1561
Reputation: 22463
I like @EdChum's answer. But reordering the values is disconcerting. It can make both human visual inspections and mechanical comparisons more difficult.
Unfortunately, Python doesn't have an ordered set, which would be the perfect tool here. So:
def unique(items):
"""
Return unique items in a list, in the same order they were
originally.
"""
seen = set()
result = []
for item in items:
if item not in seen:
result.append(item)
seen.add(item)
return result
df.col2 = df.col2.apply(lambda x: '-'.join(unique(x.split('-'))))
An alternative way of creating an ordered set is with OrderedDict
:
from collections import OrderedDict
def u2(items):
od = OrderedDict.fromkeys(items)
return list(od.keys())
You can then use u2
instead of unique
. Either way, the results are:
col1 col2
0 ben US-Uk
1 Man Uk-NL-DE
2 bee CA-CO-MX
Upvotes: 2
Reputation: 394099
You can use apply
to call a lambda function that splits the string and then joins on the unique values:
In [10]:
df['col2'] = df['col2'].apply(lambda x: '-'.join(set(x.split('-'))))
df
Out[10]:
col1 col2
0 ben Uk-US
1 Man Uk-NL-DE
2 bee CA-CO-MX
Another method:
In [22]:
df['col2'].str.split('-').apply(lambda x: '-'.join(set(x)))
Out[22]:
0 Uk-US
1 Uk-NL-DE
2 CA-CO-MX
Name: col2, dtype: object
timings
In [24]:
%timeit df['col2'].str.split('-').apply(lambda x: '-'.join(set(x)))
%timeit df['col2'] = df['col2'].apply(lambda x: '-'.join(set(x.split('-'))))
1000 loops, best of 3: 418 µs per loop
1000 loops, best of 3: 246 µs per loop
Upvotes: 2
Reputation: 3092
Try this
col2 = 'CA-CO-MX-MX'
print '-'.join(set(col2.split('-')))
Upvotes: 1