Reputation: 241
I am wanting to identify all instances within my Pandas csv file that contains text for a specific column, in this case the 'Notes' column, where there are any instances the word 'excercise' is mentioned. Once the rows are identified that contain the 'excercise' keyword in the 'Notes' columnn, I want to create a new column called 'ExcerciseDay' that then has a 1 if the 'excercise' condition was met or a 0 if it was not. I am having trouble because the text can contain long string values in the 'Notes' column (i.e. 'Excercise, Morning Workout,Alcohol Consumed, Coffee Consumed') and I still want it to identify 'excercise' even if it is within a longer string.
I tried the function below in order to identify all text that contains the word 'exercise' in the 'Notes' column. No rows are selected when I use this function and I know it is likely because of the * operator but I want to show the logic. There is probably a much more efficient way to do this but I am still relatively new to programming and python.
def IdentifyExercise(row):
if row['Notes'] == '*exercise*':
return 1
elif row['Notes'] != '*exercise*':
return 0
JoinedTables['ExerciseDay'] = JoinedTables.apply(lambda row : IdentifyExercise(row), axis=1)
Upvotes: 2
Views: 12316
Reputation: 302
Another way would be:
JoinedTables['ExerciseDay'] =[1 if "exercise" in x else 0 for x in JoinedTables['Notes']]
(Probably not the fastest solution)
Upvotes: 0
Reputation: 402413
You can also use np.where
:
JoinedTables['ExerciseDay'] = \
np.where(JoinedTables['Notes'].str.contains('exercise'), 1, 0)
Upvotes: 1
Reputation: 862591
Convert boolean Series created by str.contains
to int
by astype
:
JoinedTables['ExerciseDay'] = JoinedTables['Notes'].str.contains('exercise').astype(int)
For not case sensitive:
JoinedTables['ExerciseDay'] = JoinedTables['Notes'].str.contains('exercise', case=False)
.astype(int)
Upvotes: 5