Reputation: 2155
I have a data frame
IN_1 IN_2 C_Red C_Blue C_Green
0 Blue Red 0 0 0
1 Red Green 0 Blue Green
2 Green Red 0 Blue 0
3 Red Blue Red 0 0
I want to say if any of the values in C_Red
or C_Blue
or C_Green
, appear in either IN_1
or IN_2
, then give a 0 to my output column.
IN_1 IN_2 C_Red C_Blue C_Green OUTPUT
0 Blue Red 0 0 0 1
1 Red Green 0 Blue Green 0
2 Green Red 0 Blue 0 1
3 Red Blue Red 0 0 0
so far I've done this
df.apply(lambda x: [(c in [C_Red,C_Blue,C_Green]) for c in df.loc[x,[IN_1, IN_2]].values] )
But can't quite finish it off
Upvotes: 1
Views: 71
Reputation: 28644
A list comprehension fits in here :
in_cols = df.columns[df.columns.str.startswith("IN")]
c_cols = df.columns[df.columns.str.startswith("C")]
df["output"] = [0 if set(val.array).intersection(val1.array) else 1
for (key, val), (key1, val1)
in zip(df[in_cols].T.items(), df[c_cols].T.items())
]
IN_1 IN_2 C_Red C_Blue C_Green output
0 Blue Red 0 0 0 1
1 Red Green 0 Blue Green 0
2 Green Red 0 Blue 0 1
3 Red Blue Red 0 0 0
If your columns are fixed, then you can do the iteration with numpy and hard coded values :
df['output'] = [0
if set(entry[:2]).intersection(entry[2:])
else 1
for entry in df.to_numpy()]
Upvotes: 1
Reputation: 862581
Use set
s for check intersection with convert output to bool for False
if empty set and then pass to numpy.where
:
m = df.apply(lambda x: bool(set(x[['IN_1','IN_2']]) &
set(x[['C_Red','C_Blue', 'C_Green']])), axis=1)
Another solution with Series.isin
:
f = lambda x: x[['IN_1','IN_2']].isin(x[['C_Red','C_Blue', 'C_Green']])
m = df.apply(f, axis=1).any(axis=1)
Or solution with list comprehension:
zipped = zip(df[['IN_1','IN_2']].values, df[['C_Red','C_Blue', 'C_Green']].values)
m = [bool(set(y) & set(x)) for x, y in zipped]
df['OUTPUT'] = np.where(m, 0, 1)
print (df)
IN_1 IN_2 C_Red C_Blue C_Green OUTPUT
0 Blue Red 0 0 0 1
1 Red Green 0 Blue Green 0
2 Green Red 0 Blue 0 1
3 Red Blue Red 0 0 0
Upvotes: 1