Reputation: 10443
I have a python data frame with a column called "accredited This column should have the data of accreditation: "10/10/2011" Or put: "Not accredited" But in most of the cases when isn't accredited the column have some text, like: "This business is not accredited....." I want to replace the whole text and just put: "Not accredited"
Now, I wrote a function:
def notAcredited(string):
if ('Not' in string or 'not' in string):
return 'Not Accredited'
I'm implementing the function with a loop, is possible to do this with the ".apply" method?
for i in range(len(df_1000_1500)):
accreditacion = notAcredited(df_1000_1500['BBBAccreditation'][i])
if accreditacion == 'Not Accredited':
df_1000_1500['BBBAccreditation'][i] = accreditacion
Upvotes: 3
Views: 1663
Reputation: 880997
You could use the vectorized string method Series.str.replace
:
In [72]: df = pd.DataFrame({'accredited': ['10/10/2011', 'is not accredited']})
In [73]: df
Out[73]:
accredited
0 10/10/2011
1 is not accredited
In [74]: df['accredited'] = df['accredited'].str.replace(r'(?i).*not.*', 'not accredited')
In [75]: df
Out[75]:
accredited
0 10/10/2011
1 not accredited
The first argument passed to replace
, e.g. r'(?i).*not.*'
, can be any regex pattern. The second can be any regex replacement value -- the same kind string as would be accepted by re.sub
. The (?i)
in the regex pattern makes the pattern case-insensitive so not
, Not
, NOt
, NoT
, etc. would all match.
Series.str.replace
Cythonizes the calls to re.sub
(which makes it faster than what you could achieve using apply
since apply
uses a Python loop.)
Upvotes: 4