Jiayu Zhang
Jiayu Zhang

Reputation: 719

How to add a new column with multiple string contain conditions in python pandas other than using np.where?

I was trying to add a new column by giving multiple strings contain conditions using str.contains() and np.where() function. By this way, I can have the final result I want.

But, the code is very lengthy. Are there any good ways to reimplement this using pandas function?

df5['new_column'] = np.where(df5['sr_description'].str.contains('gross to net', case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('gross up', case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('net to gross',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('gross-to-net',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('gross-up',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('net-to-gross',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('gross 2 net',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('net 2 gross',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('gross net',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('net gross',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('memo code',case=False).fillna(False),1,0)))))))))))

This output will be,

if those strings contain in 'sr_description' then give a 1, else 0 to new_column

Maybe store the multiple string conditions in a list then read and apply them to a function.

Edit:

Sample Data:

sr_description                  new_column
something with gross up.           1
without those words.               0
or with Net to gross               1
if not then we give a '0'          0

Upvotes: 0

Views: 1242

Answers (1)

AMC
AMC

Reputation: 2702

Here is what I came up with.

Code:

import re
import pandas as pd
import numpy as np

# list of the strings we want to check for
check_strs = ['gross to net', 'gross up', 'net to gross', 'gross-to-net', 'gross-up', 'net-to-gross', 'gross 2 net',
             'net 2 gross', 'gross net', 'net gross', 'memo code']

# From the re.escape() docs: Escape special characters in pattern. 
# This is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
check_strs_esc = [re.escape(curr_val) for curr_val in check_strs]

# join all the escaped strings as a single regex
check_strs_re = '|'.join(check_strs_esc)

test_col_1 = ['something with gross up.', 'without those words.', np.NaN, 'or with Net to gross', 'if not then we give a "0"']
df_1 = pd.DataFrame(data=test_col_1, columns=['sr_description'])

df_1['contains_str'] = df_1['sr_description'].str.contains(check_strs_re, case=False, na=False)

print(df_1)

Result:

              sr_description  contains_str
0   something with gross up.          True
1       without those words.         False
2                        NaN         False
3       or with Net to gross          True
4  if not then we give a "0"         False

Note that numpy isn't required for the solution to function, I'm just using it to test a NaN value.

Let me know if anything is unclear or your have any questions! :)

Upvotes: 2

Related Questions