Reputation: 3043
Working from a pandas dataframe trying to sanitize a column from something like $12,342
to 12342
and make the column into an int or float. Found one row though with 736[4]
so I have to remove everything within the square brackets, brackets included.
Code so far
df2['Average Monthly Wage $'] = df2['Average Monthly Wage $'].str.replace('$','')
df2['Average Monthly Wage $'] = df2['Average Monthly Wage $'].str.replace(',','')
df2['Average Monthly Wage $'] = df2['Average Monthly Wage $'].str.replace(' ','')
The line below is what's supposed to handle and remove the square brackets and intentionally with it's content too.
df2['Average Monthly Wage $'] = df2['Average Monthly Wage $'].str.replace(r'[[^]]*\)','')
To some dev's this is trivial but I've not really used regular expressions often enough to know this and I've also checked around and from one such stack example formulated the above.
Upvotes: 4
Views: 1379
Reputation: 627607
You need to use
df2['Average Monthly Wage $'] = df2['Average Monthly Wage $'].str.replace(r'\[[^][]*]','',regex=True)
NOTE: you must use regex=True
if you are using a regex replacement with Series.str.replace
.
If you plan to also remove any whitespace before the initial [
, you may add \s*
at the start of the pattern and use
df2['Average Monthly Wage $'] = df2['Average Monthly Wage $'].str.replace(r'\s*\[[^][]*]','',regex=True)
See the regex demo.
Details:
\s*
- zero or more whitespaces\[
- a [
char[^][]*
- zero or more chars other than [
and ]
]
- a ]
charTIP: If you want to keep texts inside brackets that are not digits only, i.e. if you only want to remove [123]
, [1]
, etc. but keep [Fig 1]
, [notes]
, etc., you should use \[\d+]
or \s*\[\d+]
.
Upvotes: 0
Reputation: 863801
I think you need:
df2 = pd.DataFrame({'Average Monthly Wage $': ['736[4]','7336[445]', '[4]345[5]']})
print (df2)
Average Monthly Wage $
0 736[4]
1 7336[445]
2 [4]345[5]
df2['Average Monthly Wage $'] = df2['Average Monthly Wage $'].str.replace(r'\[.*?\]','')
print (df2)
Average Monthly Wage $
0 736
1 7336
2 345
Upvotes: 3