Reputation: 7458
I want to combine/concatenate values from two columns of a dataframe and compare the uniqueness of these values, e.g.
col1 col2
row1 val11 val12
row2 val21 val22
row3 val31 val32
I want to concatenate val11
and val12
, val21
and val22
, val31
and val32
, then compare val11+val12
, val21+val22
and val31+val32
for uniqueness, i.e. to check if the three concatenated values are equal.
The dtype
of col1
and col2
are all str
.
I am wondering whats the best way to do this.
Upvotes: 1
Views: 45
Reputation: 863741
You can use duplicated
for checking uniqueness of concanecated columns col1
and col2
with boolean indexing
:
print df
col1 col2
row1 val11 val12
row2 val21 val22
row3 val31 val32
row3 val31 val32
ser = df.col1 + df.col2
print ser
row1 val11val12
row2 val21val22
row3 val31val32
row3 val31val32
print ser.duplicated(keep=False)
dtype: object
row1 False
row2 False
row3 True
row3 True
print ~ser.duplicated(keep=False)
row1 True
row2 True
row3 False
row3 False
dtype: bool
print df[~ser.duplicated(keep=False)]
col1 col2
row1 val11 val12
row2 val21 val22
Upvotes: 1