Regex to remove 1. texts within parenthesis 2.numbers

Question

I can't seem to find the right regex for what I need. My data frame contains the following countries:

Switzerland17,
Iran (Islamic Republic of),
China, Hong Kong Special Administrative Region

I would like 17 to be removed from Switzerland and all text within parenthesis to be removed. So far, I only managed to do one or the other. Ps.: "China, Hong Kong Special Administrative Region" should remain the same

My current incomplete code:

Energy['Country'] = Energy['Country'].str.replace("[^a-zA-Z]",'')

Any suggestions?

Wiktor Stribiżew · Accepted Answer

You can use

Energy['Country'] = Energy['Country'].str.replace(r"\s*$[^()]*$|\d+", "", regex=True)

See the regex demo.

If you also need to remove optional whitespace before digits, you can group the two patterns after \s*:

Energy['Country'] = Energy['Country'].str.replace(r"\s*(?:$[^()]*$|\d+)", "", regex=True)

See this regex demo.

Details:

\s* - zero or more whitespaces
(?:$[^()]*$|\d+) - a non-capturing group matching either
- $[^()]*$ - a (, then zero or more chars other than ( and ) and then a )
- | - or
- \d+ - one or more digits

Regex to remove 1. texts within parenthesis 2.numbers

Answers (1)

Related Questions