Pandas to remove value if it exists in any other column in the same row

Question

I have a dataframe:

df = pd.DataFrame({'c1': ["dog", "cat", "bird"], 'c2': ["rabbit", "rat", "snake"], 'c3': ["dog", "fish", "snake"]})

It looks like:

Whenever a value in c3 appears in the same row in any other column, I want to update the c3 value to be a blank. Like this:

Here's what I have tried:

df["c3"] = df.apply(lambda x: x if x.c1 or x.c2 not in x.c3 else Nan, axis = 1)

But this throws an error:

TypeError: argument of type 'numpy.int64' is not iterable

Andrej Kesely · Accepted Answer

You can use Series.value_counts() + .apply:

df["c3"] = df.apply(
    lambda x: "" if x.value_counts()[x["c3"]] > 1 else x["c3"], axis=1
)
print(df)

Prints:

     c1      c2    c3
0   dog  rabbit      
1   cat     rat  fish
2  bird   snake

Answers (2)