Reputation: 430
So I have a dataframe called usa_sub_states
with a column called 'state'
.
In the column of state
there are 5 states with non-alphabetic strings.
Massachusetts[C]
Pennsylvania[C]
Rhode Island[D]
Virginia[C]
Hawai'i
now I was wondering if there is a way I can replace all the special characters with empty string so that they all come out as a regular state name.
usa_sub_states.state.replace(to_replace=r'[\W]', value='', regex=True, inplace=True)
but for some reason this deleted all the content of the column and makes it an empty string.
Upvotes: 2
Views: 1041
Reputation: 627082
You can use
usa_sub_states.state.replace(to_replace=r'\[[^][]*]|\W', value='', regex=True, inplace=True)
See the regex demo. Details:
\[[^][]*]
- [
, then any zero or more chars other than [
and ]
|
- or\W
- any non-word char.Upvotes: 2