pandas dataframe search string in the entire row

Question

I have a pandas dataframe like below. I want to search a text in each row of the dataframe and highlight if that text appears in the row.

For example, I want to search each row for "jones". I want to ignore the case of my search word. In the below case, I would like to add a new column to data called "jones" and it would have values 1,1,0 as that word was found in 1st and 2nd row

I found this post which shows how to find a text in a column, but how could I find a text when I have many columns - say 50+? I thought about concatenating all the columns and creating a new column, but didn't see any function that would concatenate all columns of a dataframe (without asking to type each column name)

I would like to do this for multiple keywords that I have. For example I have list of keyword LLC, Co, Blue, alpha and many more (30+)

sales = [{'account': 'Jones LLC', 'Jan': '150', 'Feb': '200', 'Mar': '140'},
         {'account': 'Alpha Co',  'Jan': 'Jones', 'Feb': '210', 'Mar': '215'},
         {'account': 'Blue Inc',  'Jan': '50',  'Feb': '90',  'Mar': '95' }]
df = pd.DataFrame(sales)

Source DF:

   Feb    Jan  Mar    account
0  200    150  140  Jones LLC
1  210  Jones  215   Alpha Co
2   90     50   95   Blue Inc

Desired DF:

   Feb    Jan  Mar    account  jones  llc  co  blue  alpha
0  200    150  140  Jones LLC      1    1   0     0      0
1  210  Jones  215   Alpha Co      1    0   1     0      1
2   90     50   95   Blue Inc      0    0   0     1      0

Little Bobby Tables · Accepted Answer

Here we use pandas built-in str function contains, along with apply and then bring it all together with any as follows,

search_string = 'Jones'

df[search_string] = (df.apply(lambda x: x.str.contains(search_string))
                       .any(axis=1).astype(int))
df

Out[2]:
     Feb    Jan    Mar   account     Jones
0    200    150    140   Jones LLC   1
1    210    Jones  215   Alpha Co    1
2    90     50     95    Blue Inc    0

This can be easily extended as contains uses regular expressions to do the matching. It also has a case arg so that you can make it case-insensitive and search for both Jones and jones.

In order to loop over a list of search words we need to make the following changes. By storing each search result (a Series) in a list, we use the list to join the series together in to DataFrame. We do this because we don't want to search new columns for the new search_string,

df_list = []

for search_string in ['Jones', 'Co', 'Alpha']:
    #use above method but rename the series instead of setting to
    # a columns. The append to a list.
    df_list.append(df.apply(lambda x: x.str.contains(search_string))
                     .any(axis=1)
                     .astype(int)
                     .rename(search_string))

#concatenate the list of series into a DataFrame with the original df
df = pd.concat([df] + df_list, axis=1)
df

Out[5]:
    Feb    Jan     Mar    account    Jones  Co   Alpha
0   200    150     140    Jones LLC  1      0    0
1   210    Jones   215    Alpha Co   1      1    1
2   90     50      95     Blue Inc   0      0    0

pandas dataframe search string in the entire row

Answers (2)

Related Questions