daiyue
daiyue

Reputation: 7458

Pandas combines values from two columns and compare for uniqueness

I want to combine/concatenate values from two columns of a dataframe and compare the uniqueness of these values, e.g.

      col1    col2
row1  val11   val12
row2  val21   val22
row3  val31   val32

I want to concatenate val11 and val12, val21 and val22, val31 and val32, then compare val11+val12, val21+val22 and val31+val32 for uniqueness, i.e. to check if the three concatenated values are equal.

The dtype of col1 and col2 are all str.

I am wondering whats the best way to do this.

Upvotes: 1

Views: 45

Answers (1)

jezrael
jezrael

Reputation: 863741

You can use duplicated for checking uniqueness of concanecated columns col1 and col2 with boolean indexing:

print df
       col1   col2
row1  val11  val12
row2  val21  val22
row3  val31  val32
row3  val31  val32

ser = df.col1 + df.col2
print ser
row1    val11val12
row2    val21val22
row3    val31val32
row3    val31val32

print ser.duplicated(keep=False)
dtype: object
row1    False
row2    False
row3     True
row3     True

print ~ser.duplicated(keep=False)
row1     True
row2     True
row3    False
row3    False
dtype: bool

print df[~ser.duplicated(keep=False)]
       col1   col2
row1  val11  val12
row2  val21  val22

Upvotes: 1

Related Questions