Reputation: 139
I want to run query on multiple columns (Data1, Data2) to find the values in other columns (Column1, Column2, Column3) if it exists in any of these columns (considering these columns have empty cells also)
My data looks like this:
Column1 Column2 Column3 Data1 Data2
sunday monday sunday friday
sunday friday tuesday friday sunday
sunday monday thursday tuesday
I want the output like:
Column1 Column2 Column3 Data1 Exist_Data1 Data2 Exist_Data2
sunday monday sunday Yes friday Yes
sunday friday tuesday friday Yes wednesday No
sunday monday thursday No tuesday Yes
I have tried the command below:
pd.Series(['A', 'B']).isin(df.columns).all() (as mentioned in the Check if multiple columns exist in a df
but it check in all the columns. I have modified the df.columns but doesn't seem to work either.
*******UPDATE ********
Editing after trying the code:
As can be seen in the image below, there are some entries in Data1 and Data2 that exists in the other columns but still says "No". (There are no whitespace characters - all are removed using strip()" Double checked after writing the CSV and checking for spaces.
Upvotes: 1
Views: 1295
Reputation: 862406
You need Series.isin
with flatten all values of columns to one long list:
cols = ['Column1','Column2','Column3']
v = np.ravel(df[cols])
df['Exist_Data1'] = np.where(df['Data1'].isin(v),'Yes', 'No')
df['Exist_Data2'] = np.where(df['Data2'].isin(v),'Yes', 'No')
#alternative
df[['Exist_Data1','Exist_Data2']] = np.where(df[['Data1', 'Data2']].isin(v),'Yes', 'No')
print (df)
Column1 Column2 Column3 Data1 Data2 Exist_Data1 Exist_Data2
0 sunday monday NaN sunday friday Yes Yes
1 sunday friday tuesday friday wednesday Yes No
2 sunday NaN monday thursday tuesday No Yes
Upvotes: 1