Reputation: 109
I can't seem to find the right regex for what I need. My data frame contains the following countries:
Switzerland17,
Iran (Islamic Republic of),
China, Hong Kong Special Administrative Region
I would like 17 to be removed from Switzerland and all text within parenthesis to be removed. So far, I only managed to do one or the other. Ps.: "China, Hong Kong Special Administrative Region" should remain the same
My current incomplete code:
Energy['Country'] = Energy['Country'].str.replace("[^a-zA-Z]",'')
Any suggestions?
Upvotes: 2
Views: 71
Reputation: 627082
You can use
Energy['Country'] = Energy['Country'].str.replace(r"\s*\([^()]*\)|\d+", "", regex=True)
See the regex demo.
If you also need to remove optional whitespace before digits, you can group the two patterns after \s*
:
Energy['Country'] = Energy['Country'].str.replace(r"\s*(?:\([^()]*\)|\d+)", "", regex=True)
See this regex demo.
Details:
\s*
- zero or more whitespaces(?:\([^()]*\)|\d+)
- a non-capturing group matching either
\([^()]*\)
- a (
, then zero or more chars other than (
and )
and then a )
|
- or\d+
- one or more digitsUpvotes: 1