swordlordswamplord
swordlordswamplord

Reputation: 430

How to remove multiple patterns of special characters in Python Regex Dataframe

So I have a dataframe called usa_sub_states with a column called 'state'.

In the column of state there are 5 states with non-alphabetic strings.

Massachusetts[C]
Pennsylvania[C]
Rhode Island[D]
Virginia[C]
Hawai'i

now I was wondering if there is a way I can replace all the special characters with empty string so that they all come out as a regular state name.

usa_sub_states.state.replace(to_replace=r'[\W]', value='', regex=True, inplace=True) 

but for some reason this deleted all the content of the column and makes it an empty string.

Upvotes: 2

Views: 1041

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

You can use

usa_sub_states.state.replace(to_replace=r'\[[^][]*]|\W', value='', regex=True, inplace=True)

See the regex demo. Details:

  • \[[^][]*] - [, then any zero or more chars other than [ and ]
  • | - or
  • \W - any non-word char.

Upvotes: 2

Related Questions