Assign values to DF rows based on string portions

Question

I have a dataframe with 1 column (these are headernames for another dataframe. I am trying to assign weightings to these based on strings names contained in the rows. They all have long names (classes and subclasses like) seperated by underscores, for example: email_Trading Only, readership_unique_client, roadshow_NDR_Con_Call_Meetings, forum_meeting,

I would like to assign weights to these based on string instances that occur before/inbetween/after underscores.

Was thinking about creating a dictionary of sorts, but not sure how to loop and iterate through all the rows properly. Pseudocode here:

for i in rows: 
     if i contains 'email' #before first underscore
          then 0.5 #assigned to corresponding row in new column of DF

Sample Data and output (based on first string portion before underscore(:

                                TITLES   WEIGHTS     
2                        emp_full_name     0
3                      emp_office_code     0
4              emp_country_office_code     0
..                                 ...
171   forum_presentation_Platinum Plus     0.5
172  forum_presentation_Private Client     0.5
173          forum_presentation_Silver     0.5

RichieV · Accepted Answer

See the user guide on how to test for string that contains a pattern.

You can solve it with something like

df['WEIGHTS'] = df.TITLES.str.contains('email') * 0.5

Or create the column and then update it

df['WEIGHTS'] = 0
df.loc[df.TITLES.str.contains('email'), 'WEIGHTS'] = 0.5

Update

.str accessors work with regex by default so you can include optional patterns like

df.loc[df.TITLES.str.contains('(email)|(forum)'), 'WEIGHTS'] = 0.5

You can also get the first part of the strings with

label = df.TITLES.str.split().str[0]

Then use a mapper with series.replace, but you would need to include all possible suffixes

df['WEIGHTS'] = label.replace({'email': 0.5, 'forum': 0.2 ...})

Assign values to DF rows based on string portions

Answers (1)

Related Questions