How to drop rows with duplicates column values and the number of columns are not always fixed?

I have a dataframe and columns in that dataframe could be any number(2-50). for example it is 2 columns as below. I want to remove rows where site1 and site2 are same.

df = pd.DataFrame([[507814, 501972], [529389, 529389], [508110, 508161]], columns = ['site1', 'site2'])

I want to drop rows with similar column values as below Expected Output:

df[df["site1"] != df["site2"]]

This can be done this this line, but as I do not have fix number of column and this piece is inside of loop i need a fastest way to do this

I appreciate the help in advance.

Thanks.

Upvotes: 1

Answers (4)

rhug123

Reputation: 8768

Here is another way. This should work if all your site values are numbers.

df.loc[df.diff(axis=1).sum(axis=1).ne(0)]

Upvotes: 1

Andrej Kesely

Reputation: 195408

If you have more columns, you can use set() + len():

x = df[~df.apply(lambda x: len(set(x)), axis=1).eq(1)]
print(x)

Prints:

    site1   site2
0  507814  501972
2  508110  508161

Edit: To specify columns:

x = df[~df[["site1", "site2"]].apply(lambda x: len(set(x)), axis=1).eq(1)]
print(x)

Prints:

    site1   site2   site3
0  507814  501972  508284
2  508110  508161  508098

df used:

    site1   site2   site3
0  507814  501972  508284
1  529389  529389  508284
2  508110  508161  508098

Upvotes: 2

Ayoub ZAROU

Reputation: 2417

you could do:

df = df[df.nunique(axis=1) > 1]

Upvotes: 1

Albo

Reputation: 1644

Using your example, this filters the columns where site1 == site2:

# first option
df[~df.apply(lambda x: x["site1"] == x["site2"], axis=1)]

# second option
df.query("site1 != site2")

All options give you:

    site1   site2
0   507814  501972
2   508110  508161

Upvotes: 0

How to drop rows with duplicates column values and the number of columns are not always fixed?

Answers (4)

Related Questions