Replace string values in a dataframe based on regex match

Question

I have a python data frame with a column called "accredited This column should have the data of accreditation: "10/10/2011" Or put: "Not accredited" But in most of the cases when isn't accredited the column have some text, like: "This business is not accredited....." I want to replace the whole text and just put: "Not accredited"

Now, I wrote a function:

def notAcredited(string):
    if ('Not' in string or 'not' in string):
        return  'Not Accredited'

I'm implementing the function with a loop, is possible to do this with the ".apply" method?

for i in range(len(df_1000_1500)):
    accreditacion = notAcredited(df_1000_1500['BBBAccreditation'][i])
    if accreditacion == 'Not Accredited':
        df_1000_1500['BBBAccreditation'][i] = accreditacion

unutbu · Accepted Answer

You could use the vectorized string method Series.str.replace:

In [72]: df = pd.DataFrame({'accredited': ['10/10/2011', 'is not accredited']})

In [73]: df
Out[73]: 
          accredited
0         10/10/2011
1  is not accredited

In [74]: df['accredited'] = df['accredited'].str.replace(r'(?i).*not.*', 'not accredited')

In [75]: df
Out[75]: 
       accredited
0      10/10/2011
1  not accredited

The first argument passed to replace, e.g. r'(?i).*not.*', can be any regex pattern. The second can be any regex replacement value -- the same kind string as would be accepted by re.sub. The (?i) in the regex pattern makes the pattern case-insensitive so not, Not, NOt, NoT, etc. would all match.

Series.str.replace Cythonizes the calls to re.sub (which makes it faster than what you could achieve using apply since apply uses a Python loop.)

Replace string values in a dataframe based on regex match

Answers (1)

Related Questions