Reputation: 4575
I have a pandas dataframe with a column named “Notes”. It has entries like the example below. I would like to create dummy variable columns based on a list:
Lst=[‘loan’,’Borrower’,’debts’]
That is I’d like to create a binary flag for each entry in the list if the string in the “Notes” column contains it. Can anyone suggest how to do this?
data:
print(data_df[['Id','Notes']][:10])
Id Notes
59 60 568549 added on 11/04/09 > I use my current l...
76 77 I would like to use this loan to consolidate c...
88 89 Borrower added on 06/28/10 > I would really ...
229 230 I just got married and ran up some debt during...
output:
Id Notes loan Borrower debts
59 60 568549 added on 11/04/09 > I use my current l... 0 0 0
76 77 I would like to use this loan to consolidate c... 1 0 0
88 89 Borrower added on 06/28/10 > I would really ... 0 1 0
229 230 I just got married and ran up some debt during... 0 0 1
Upvotes: 0
Views: 2202
Reputation: 137
To use a function to convert the data you should create a new column, assign this column an apply method with a lambda expression. Like so:
<dataframe>['new column name'] = <dataframe>['some existing column name'].apply(<some function>)
in your case more specifically:
data_df['loan'] = data_df.Notes.apply(lambda x: 1 if x.str.contains('loan') else 0)
data_df['Borrower'] = data_df.Notes.apply(lambda x: 1 if x.str.contains('Borrower') else 0)
data_df['debt'] = data_df.Notes.apply(lambda x: 1 if x.str.contains('debt') else 0)
Could probably def a new function if you have multiple lines but this gets the idea across
Upvotes: 0
Reputation: 323226
Check with str.findall
then get_dummies
df.Note.str.findall('|'.join(Lst)).str[0].str.get_dummies()
Out[639]:
Borrower debts loan
0 0 0 1
1 1 0 0
2 0 1 0
yourdf=pd.concat([df,df.Note.str.findall('|'.join(Lst)).str[0].str.get_dummies()],axis=1)
yourdf
Out[640]:
Note Borrower debts loan
0 loan lll 0 0 1
1 llll Borrower 1 0 0
2 ......debts 0 1 0
df=pd.DataFrame({'Note':['loan lll','llll Borrower','......debts']})
Upvotes: 1