DEB
DEB

Reputation: 241

How to search for specific text within a Pandas dataframe column?

I am wanting to identify all instances within my Pandas csv file that contains text for a specific column, in this case the 'Notes' column, where there are any instances the word 'excercise' is mentioned. Once the rows are identified that contain the 'excercise' keyword in the 'Notes' columnn, I want to create a new column called 'ExcerciseDay' that then has a 1 if the 'excercise' condition was met or a 0 if it was not. I am having trouble because the text can contain long string values in the 'Notes' column (i.e. 'Excercise, Morning Workout,Alcohol Consumed, Coffee Consumed') and I still want it to identify 'excercise' even if it is within a longer string.

I tried the function below in order to identify all text that contains the word 'exercise' in the 'Notes' column. No rows are selected when I use this function and I know it is likely because of the * operator but I want to show the logic. There is probably a much more efficient way to do this but I am still relatively new to programming and python.

def IdentifyExercise(row):
    if row['Notes'] == '*exercise*':
        return 1
    elif row['Notes'] != '*exercise*':
        return 0


JoinedTables['ExerciseDay'] = JoinedTables.apply(lambda row : IdentifyExercise(row), axis=1) 

Upvotes: 2

Views: 12316

Answers (3)

JoseleMG
JoseleMG

Reputation: 302

Another way would be:

JoinedTables['ExerciseDay'] =[1 if "exercise" in x  else 0 for x in JoinedTables['Notes']]

(Probably not the fastest solution)

Upvotes: 0

cs95
cs95

Reputation: 402413

You can also use np.where:

JoinedTables['ExerciseDay'] = \
    np.where(JoinedTables['Notes'].str.contains('exercise'), 1, 0)

Upvotes: 1

jezrael
jezrael

Reputation: 862591

Convert boolean Series created by str.contains to int by astype:

JoinedTables['ExerciseDay'] = JoinedTables['Notes'].str.contains('exercise').astype(int)

For not case sensitive:

JoinedTables['ExerciseDay'] = JoinedTables['Notes'].str.contains('exercise', case=False)
                                                   .astype(int)

Upvotes: 5

Related Questions