DEB
DEB

Reputation: 241

Select rows following specific patterns in a pandas dataframe

I have a csv file that I read into a pandas dataframe. There are two specific columns, 'Notes' and 'ActivityType', that I want to use as criteria. If the 'Notes' column contains a string value of 'Morning exercise' or ' Morning workout' and/or the 'ActivityType' column contains any string value (majority of cells are Null and I don't want Null values counted) then make a new column 'MorningExercise' and insert a 1 if either conditions are met or a 0 if neither are.

I have been using the code below to create a new column and insert a 1 or 0 if the text conditions are met in the 'Notes' column, but I have not figured out how to include a 1 if the 'ActivityType' column contains any string value.

JoinedTables['MorningExercise'] = JoinedTables['Notes'].str.contains(('Morning workout' or 'Morning exercise'), case=False, na=False).astype(int)

For the 'ActivityType' column, I would think to use the pd.notnull() function as the critieria.

I really just need a way in python to see if either criteria are met in a row and if so, input a 1 or 0 in a new column.

Upvotes: 2

Views: 1254

Answers (1)

cs95
cs95

Reputation: 402323

You'll need to fashion a regex pattern to use with str.contains:

regex = r'Morning\s*(?:workout|exercise)'
JoinedTables['MorningExercise'] = \
       JoinedTables['Notes'].str.contains(regex, case=False, na=False).astype(int)

Details

Morning       # match "Morning"
\s*           # 0 or more whitespace chars
(?:           # open non-capturing group
workout       # match "workout" 
|             # OR operator
exercise      # match "exercise"
)

The pattern will look for Morning followed by either workout or exercise.

Upvotes: 5

Related Questions