Jane Borges
Jane Borges

Reputation: 592

Searching for a word within a dataframe column

I have the following datadrame:

       import pandas as pd
       df = pd.DataFrame({'Id_email': [1, 2, 3, 4], 
                          'Word': ['_ SENSOR 12', 'new_SEN041', 'engine', 'sens 12'],
                          'Date': ['2018-01-05', '2018-01-06', '2017-01-06', '2018-01-05']})

     print(df)

I would like to scroll through the 'Word' column looking for derivatives of the word Sensor.

If I found it I wanted to fill the new column 'Type' with Sensor_Type, if I didn't find it, in the corresponding line, I wanted to fill it with Other.

I tried to implement it as follows (this code is wrong):

      df['Type'] = 'Other'

      for i in range(0, len(df)):

         if(re.search('\\SEN\\b', df['Word'].iloc[i], re.IGNORECASE) or
            re.search('\\sen\\b', df['Word'].iloc[i], re.IGNORECASE)):

                    df['Type'].iloc[i] == 'Sensor_Type'
        else:
                   df['Type'].iloc[i] == 'Other'

My (wrong) output is as follows:

Id_email        Word         Date_end   Type
     1      _ SENSOR 12     2018-01-05  Other
     2       new_SEN041     2018-01-06  Other
     3         engine       2017-01-06  Other
     4         sens 12      2018-01-05  Other

But, I would like the output to be like this:

Id_email        Word         Date_end   Type
     1      _ SENSOR 12     2018-01-05  Sensor_Type
     2       new_SEN041     2018-01-06  Sensor_Type
     3            engine    2017-01-06  Other
     4         sens 12      2018-01-05  Sensor_Type

Upvotes: 0

Views: 114

Answers (2)

Chris
Chris

Reputation: 16172

df['Type'] = df.apply(lambda x: 'Sensor_Type' if re.search(r'SEN|sen',x['Word']) else 'Other', axis=1)

Upvotes: 1

sammywemmy
sammywemmy

Reputation: 28729

Use pandas str contains, and include case as False - this allows you to search for sen or SEN

df.assign(Type = lambda x: np.where(x.Word.str.contains(r'SEN', case=False), 
                                    'Sensor_Type','Other'))

    Id_email    Word    Date    Type
0   1   _ SENSOR 12 2018-01-05  Sensor_Type
1   2   new_SEN041  2018-01-06  Sensor_Type
2   3   engine  2017-01-06  Other
3   4   sens 12 2018-01-05  Sensor_Type

Upvotes: 3

Related Questions