Reputation: 407
I have a dataframe:
df = pd.DataFrame({'c1': ["dog", "cat", "bird"], 'c2': ["rabbit", "rat", "snake"], 'c3': ["dog", "fish", "snake"]})
It looks like:
Whenever a value in c3 appears in the same row in any other column, I want to update the c3 value to be a blank. Like this:
Here's what I have tried:
df["c3"] = df.apply(lambda x: x if x.c1 or x.c2 not in x.c3 else Nan, axis = 1)
But this throws an error:
TypeError: argument of type 'numpy.int64' is not iterable
Upvotes: 3
Views: 1060
Reputation: 4011
Another approach based on testing equality across columns and replacing with np.where
:
df['c3'] = np.where(df[df.drop('c3', axis=1).columns].eq(df['c3'], axis=0).any(axis=1),
"", df['c3'])
c1 c2 c3
0 dog rabbit
1 cat rat fish
2 bird snake
Upvotes: 2
Reputation: 195428
You can use Series.value_counts()
+ .apply
:
df["c3"] = df.apply(
lambda x: "" if x.value_counts()[x["c3"]] > 1 else x["c3"], axis=1
)
print(df)
Prints:
c1 c2 c3
0 dog rabbit
1 cat rat fish
2 bird snake
Upvotes: 1