Reputation: 1691
I have a dataframe and columns in that dataframe could be any number(2-50). for example it is 2 columns as below. I want to remove rows where site1 and site2 are same.
df = pd.DataFrame([[507814, 501972], [529389, 529389], [508110, 508161]], columns = ['site1', 'site2'])
I want to drop rows with similar column values as below Expected Output:
df[df["site1"] != df["site2"]]
This can be done this this line, but as I do not have fix number of column and this piece is inside of loop i need a fastest way to do this
I appreciate the help in advance.
Thanks.
Upvotes: 1
Views: 63
Reputation: 8768
Here is another way. This should work if all your site values are numbers.
df.loc[df.diff(axis=1).sum(axis=1).ne(0)]
Upvotes: 1
Reputation: 195408
If you have more columns, you can use set()
+ len()
:
x = df[~df.apply(lambda x: len(set(x)), axis=1).eq(1)]
print(x)
Prints:
site1 site2
0 507814 501972
2 508110 508161
Edit: To specify columns:
x = df[~df[["site1", "site2"]].apply(lambda x: len(set(x)), axis=1).eq(1)]
print(x)
Prints:
site1 site2 site3
0 507814 501972 508284
2 508110 508161 508098
df
used:
site1 site2 site3
0 507814 501972 508284
1 529389 529389 508284
2 508110 508161 508098
Upvotes: 2
Reputation: 1644
Using your example, this filters the columns where site1 == site2
:
# first option
df[~df.apply(lambda x: x["site1"] == x["site2"], axis=1)]
# second option
df.query("site1 != site2")
All options give you:
site1 site2
0 507814 501972
2 508110 508161
Upvotes: 0