fred.schwartz
fred.schwartz

Reputation: 2155

Pandas isin on multiply columns

I have a data frame

    IN_1      IN_2     C_Red   C_Blue  C_Green     
0   Blue     Red        0       0        0          
1   Red      Green      0      Blue      Green      
2   Green    Red        0      Blue       0         
3   Red      Blue      Red      0         0         

I want to say if any of the values in C_Red or C_Blue or C_Green, appear in either IN_1 or IN_2, then give a 0 to my output column.

    IN_1      IN_2     C_Red   C_Blue  C_Green     OUTPUT
0   Blue     Red        0       0        0          1
1   Red      Green      0      Blue      Green      0
2   Green    Red        0      Blue       0         1
3   Red      Blue      Red      0         0         0

so far I've done this

df.apply(lambda x: [(c in [C_Red,C_Blue,C_Green]) for c in df.loc[x,[IN_1, IN_2]].values] )

But can't quite finish it off

Upvotes: 1

Views: 71

Answers (2)

sammywemmy
sammywemmy

Reputation: 28644

A list comprehension fits in here :

in_cols = df.columns[df.columns.str.startswith("IN")]
c_cols = df.columns[df.columns.str.startswith("C")]

df["output"] = [0 if set(val.array).intersection(val1.array) else 1
                for (key, val), (key1, val1)
                in zip(df[in_cols].T.items(), df[c_cols].T.items())
                ]

    IN_1    IN_2    C_Red   C_Blue  C_Green output
0   Blue    Red     0        0         0        1
1   Red     Green   0        Blue      Green    0
2   Green   Red     0        Blue      0        1
3   Red     Blue    Red      0         0        0

If your columns are fixed, then you can do the iteration with numpy and hard coded values :

df['output'] = [0 
                if set(entry[:2]).intersection(entry[2:]) 
                else 1 
                for entry in df.to_numpy()]

Upvotes: 1

jezrael
jezrael

Reputation: 862581

Use sets for check intersection with convert output to bool for False if empty set and then pass to numpy.where:

m = df.apply(lambda x: bool(set(x[['IN_1','IN_2']]) & 
                            set(x[['C_Red','C_Blue', 'C_Green']])), axis=1)

Another solution with Series.isin:

f = lambda x: x[['IN_1','IN_2']].isin(x[['C_Red','C_Blue', 'C_Green']])
m = df.apply(f, axis=1).any(axis=1)

Or solution with list comprehension:

zipped = zip(df[['IN_1','IN_2']].values, df[['C_Red','C_Blue', 'C_Green']].values)
m = [bool(set(y) & set(x)) for x, y in zipped]

df['OUTPUT'] = np.where(m, 0, 1)
print (df)
    IN_1   IN_2 C_Red C_Blue C_Green  OUTPUT
0   Blue    Red     0      0       0       1
1    Red  Green     0   Blue   Green       0
2  Green    Red     0   Blue       0       1
3    Red   Blue   Red      0       0       0

Upvotes: 1

Related Questions