John Doe
John Doe

Reputation: 10203

Check if string in one column is contained in string of another column in the same row and add new column with matching column name

In addition on my previous question Search for value in all DataFrame columns (except first column !) and add new column with matching column name (where I used a static keyword)

I'd like to check if the string in the first column is contained in one of the another columns in the same row and then add a new column with the matching column name(s). All columns names of all matched values!

Now i'm using this with a static keyword:

keyword='123'
f = lambda row: row.apply(str).str.replace(".","").str.contains(keyword ,na=False, flags=re.IGNORECASE)
df1 = df.iloc[:,1:].apply(f, axis=1)

df.insert(loc=1, column='Matching_Columns', value=df1.dot(df.columns[1:] + ', ').str.strip(', '))

Sample:

Input:

key | col_B | col_C | col_D | col_E
------------------------------------
123 | abcd  | 12345 | fght  | 7890
567 | tdfe  | 6353  | 0567  | 56789

Output:

key | match       | col_B | col_C | col_D | col_E
-------------------------------------------------
123 | col_C       | abcd  | 12345 | fght  | 7890
567 | col_D,col_E | tdfe  | 6353  | 0567  | 56789

Any help much appreciated!

Upvotes: 0

Views: 1096

Answers (3)

Andy L.
Andy L.

Reputation: 25259

First, apply to get boolean dataframe. Next, using mask to assign column names to True value, replace False to NaN and agg join on dropna series:

df1 = df.astype(str).apply(lambda x: x[1:].str.contains(x.key), axis=1)
df['match'] = df1.mask(df1, df1.columns[None,:]).replace(False,np.nan) \
                 .agg(lambda x: ','.join(x.dropna()), axis=1)


Out[41]:
   key col_B  col_C col_D  col_E        match
0  123  abcd  12345  fght   7890        col_C
1  567  tdfe   6353  0567  56789  col_D,col_E

Upvotes: 2

anky
anky

Reputation: 75100

Another method involving df.dot()

m=df.astype(str).apply(lambda x: x.str.contains(x['key']),axis=1).iloc[:,1:]
df['match']=m.dot(m.columns+',').str[:-1]
print(df)

   key    col_B  col_C    col_D  col_E        match
0  123   abcd    12345   fght     7890        col_C
1  567   tdfe     6353   0567    56789  col_D,col_E

Upvotes: 2

Vishnudev Krishnadas
Vishnudev Krishnadas

Reputation: 10960

>>> df
  to_find col1 col2
0       a   ab   ac
1       b   aa   ba
2       c   bc   ee
>>> df['found_in'] = df.apply(lambda x: ' '.join(x.iloc[1:][x.iloc[1:].str.contains(str(x['to_find']))].index), axis=1)
>>> df
  to_find col1 col2   found_in
0       a   ab   ac  col1 col2
1       b   aa   ba       col2
2       c   bc   ee       col1

For better readability,

>>> def get_columns(x):
...     y = x.iloc[1:]
...     return y.index[y.str.contains(str(x['to_find']))]
... 
>>> df['found_in'] = df.apply(lambda x: ' '.join(get_columns(x)), axis=1)

Upvotes: 1

Related Questions