Reputation: 14415
Is there an easy way to pull out the distinct combinations of values in a dataframe? I've used pd.Series.unique() for single columns, but what about multiple columns?
Example data:
df = pd.DataFrame(data=[[1, 'a'], [2, 'a'], [3, 'b'], [3, 'b'], [1, 'b'], [1, 'b']],
columns=['number', 'letter'])
Expected output:
(1, a)
(2, a)
(3, b)
(1, b)
Ideally, I'd like a separate Series object of tuples with the distinct values.
Upvotes: 5
Views: 1919
Reputation: 394041
You can set the index to those columns and then call unique
on the index:
In [165]:
idx = df.set_index(['number','letter']).index
idx.unique()
Out[165]:
array([(1, 'a'), (2, 'a'), (3, 'b'), (1, 'b')], dtype=object)
Upvotes: 5
Reputation: 109546
You can zip the columns and create a set:
>>> set(zip(df.number, df.letter))
{(1, 'a'), (1, 'b'), (2, 'a'), (3, 'b')}
Upvotes: 7