Reputation: 339
I have a dataframe with 1 column (these are headernames for another dataframe. I am trying to assign weightings to these based on strings names contained in the rows. They all have long names (classes and subclasses like) seperated by underscores, for example: email_Trading Only, readership_unique_client, roadshow_NDR_Con_Call_Meetings, forum_meeting,
I would like to assign weights to these based on string instances that occur before/inbetween/after underscores.
Was thinking about creating a dictionary of sorts, but not sure how to loop and iterate through all the rows properly. Pseudocode here:
for i in rows:
if i contains 'email' #before first underscore
then 0.5 #assigned to corresponding row in new column of DF
Sample Data and output (based on first string portion before underscore(:
TITLES WEIGHTS
2 emp_full_name 0
3 emp_office_code 0
4 emp_country_office_code 0
.. ...
171 forum_presentation_Platinum Plus 0.5
172 forum_presentation_Private Client 0.5
173 forum_presentation_Silver 0.5
Upvotes: 0
Views: 38
Reputation: 5183
See the user guide on how to test for string that contains a pattern.
You can solve it with something like
df['WEIGHTS'] = df.TITLES.str.contains('email') * 0.5
Or create the column and then update it
df['WEIGHTS'] = 0
df.loc[df.TITLES.str.contains('email'), 'WEIGHTS'] = 0.5
Update
.str
accessors work with regex by default so you can include optional patterns like
df.loc[df.TITLES.str.contains('(email)|(forum)'), 'WEIGHTS'] = 0.5
You can also get the first part of the strings with
label = df.TITLES.str.split().str[0]
Then use a mapper with series.replace
, but you would need to include all possible suffixes
df['WEIGHTS'] = label.replace({'email': 0.5, 'forum': 0.2 ...})
Upvotes: 1