SMJune
SMJune

Reputation: 407

Pandas to remove value if it exists in any other column in the same row

I have a dataframe:

df = pd.DataFrame({'c1': ["dog", "cat", "bird"], 'c2': ["rabbit", "rat", "snake"], 'c3': ["dog", "fish", "snake"]})

It looks like:

enter image description here

Whenever a value in c3 appears in the same row in any other column, I want to update the c3 value to be a blank. Like this:

enter image description here

Here's what I have tried:

df["c3"] = df.apply(lambda x: x if x.c1 or x.c2 not in x.c3 else Nan, axis = 1)

But this throws an error:

TypeError: argument of type 'numpy.int64' is not iterable

Upvotes: 3

Views: 1060

Answers (2)

Brendan
Brendan

Reputation: 4011

Another approach based on testing equality across columns and replacing with np.where:

df['c3'] = np.where(df[df.drop('c3', axis=1).columns].eq(df['c3'], axis=0).any(axis=1), 
                    "", df['c3'])
     c1      c2    c3
0   dog  rabbit      
1   cat     rat  fish
2  bird   snake      

Upvotes: 2

Andrej Kesely
Andrej Kesely

Reputation: 195428

You can use Series.value_counts() + .apply:

df["c3"] = df.apply(
    lambda x: "" if x.value_counts()[x["c3"]] > 1 else x["c3"], axis=1
)
print(df)

Prints:

     c1      c2    c3
0   dog  rabbit      
1   cat     rat  fish
2  bird   snake      

Upvotes: 1

Related Questions