user3476463
user3476463

Reputation: 4575

Create dummy variable from list

I have a pandas dataframe with a column named “Notes”. It has entries like the example below. I would like to create dummy variable columns based on a list:

Lst=[‘loan’,’Borrower’,’debts’]

That is I’d like to create a binary flag for each entry in the list if the string in the “Notes” column contains it. Can anyone suggest how to do this?

data:

print(data_df[['Id','Notes']][:10])

     Id                                              Notes
59    60   568549 added on 11/04/09 > I use my current l...     
76    77  I would like to use this loan to consolidate c...
88    89    Borrower added on 06/28/10 > I would really ...
229  230  I just got married and ran up some debt during...

output:

     Id                                              Notes      loan        Borrower        debts
59    60   568549 added on 11/04/09 > I use my current l...     0       0           0
76    77  I would like to use this loan to consolidate c...     1       0           0
88    89    Borrower added on 06/28/10 > I would really ...     0       1           0
229  230  I just got married and ran up some debt during...     0       0           1

Upvotes: 0

Views: 2202

Answers (2)

John Ketterer
John Ketterer

Reputation: 137

To use a function to convert the data you should create a new column, assign this column an apply method with a lambda expression. Like so:

<dataframe>['new column name'] = <dataframe>['some existing column name'].apply(<some function>)

in your case more specifically:

data_df['loan'] = data_df.Notes.apply(lambda x: 1 if x.str.contains('loan') else 0)
data_df['Borrower'] = data_df.Notes.apply(lambda x: 1 if x.str.contains('Borrower') else 0)
data_df['debt'] = data_df.Notes.apply(lambda x: 1 if x.str.contains('debt') else 0)

Could probably def a new function if you have multiple lines but this gets the idea across

Upvotes: 0

BENY
BENY

Reputation: 323226

Check with str.findall then get_dummies

df.Note.str.findall('|'.join(Lst)).str[0].str.get_dummies()
Out[639]: 
   Borrower  debts  loan
0         0      0     1
1         1      0     0
2         0      1     0
yourdf=pd.concat([df,df.Note.str.findall('|'.join(Lst)).str[0].str.get_dummies()],axis=1)
yourdf
Out[640]: 
            Note  Borrower  debts  loan
0       loan lll         0      0     1
1  llll Borrower         1      0     0
2    ......debts         0      1     0

df=pd.DataFrame({'Note':['loan lll','llll Borrower','......debts']})

Upvotes: 1

Related Questions