Reputation: 719
I was trying to add a new column by giving multiple strings contain conditions using str.contains()
and np.where()
function. By this way, I can have the final result I want.
But, the code is very lengthy. Are there any good ways to reimplement this using pandas function?
df5['new_column'] = np.where(df5['sr_description'].str.contains('gross to net', case=False).fillna(False),1,
np.where(df5['sr_description'].str.contains('gross up', case=False).fillna(False),1,
np.where(df5['sr_description'].str.contains('net to gross',case=False).fillna(False),1,
np.where(df5['sr_description'].str.contains('gross-to-net',case=False).fillna(False),1,
np.where(df5['sr_description'].str.contains('gross-up',case=False).fillna(False),1,
np.where(df5['sr_description'].str.contains('net-to-gross',case=False).fillna(False),1,
np.where(df5['sr_description'].str.contains('gross 2 net',case=False).fillna(False),1,
np.where(df5['sr_description'].str.contains('net 2 gross',case=False).fillna(False),1,
np.where(df5['sr_description'].str.contains('gross net',case=False).fillna(False),1,
np.where(df5['sr_description'].str.contains('net gross',case=False).fillna(False),1,
np.where(df5['sr_description'].str.contains('memo code',case=False).fillna(False),1,0)))))))))))
This output will be,
if those strings contain in 'sr_description' then give a 1
, else 0
to new_column
Maybe store the multiple string conditions in a list then read and apply them to a function.
Edit:
Sample Data:
sr_description new_column
something with gross up. 1
without those words. 0
or with Net to gross 1
if not then we give a '0' 0
Upvotes: 0
Views: 1242
Reputation: 2702
Here is what I came up with.
Code:
import re
import pandas as pd
import numpy as np
# list of the strings we want to check for
check_strs = ['gross to net', 'gross up', 'net to gross', 'gross-to-net', 'gross-up', 'net-to-gross', 'gross 2 net',
'net 2 gross', 'gross net', 'net gross', 'memo code']
# From the re.escape() docs: Escape special characters in pattern.
# This is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
check_strs_esc = [re.escape(curr_val) for curr_val in check_strs]
# join all the escaped strings as a single regex
check_strs_re = '|'.join(check_strs_esc)
test_col_1 = ['something with gross up.', 'without those words.', np.NaN, 'or with Net to gross', 'if not then we give a "0"']
df_1 = pd.DataFrame(data=test_col_1, columns=['sr_description'])
df_1['contains_str'] = df_1['sr_description'].str.contains(check_strs_re, case=False, na=False)
print(df_1)
Result:
sr_description contains_str
0 something with gross up. True
1 without those words. False
2 NaN False
3 or with Net to gross True
4 if not then we give a "0" False
Note that numpy
isn't required for the solution to function, I'm just using it to test a NaN
value.
Let me know if anything is unclear or your have any questions! :)
Upvotes: 2