pegasus123
pegasus123

Reputation: 109

Regex to remove 1. texts within parenthesis 2.numbers

I can't seem to find the right regex for what I need. My data frame contains the following countries:

Switzerland17,
Iran (Islamic Republic of),
China, Hong Kong Special Administrative Region

I would like 17 to be removed from Switzerland and all text within parenthesis to be removed. So far, I only managed to do one or the other. Ps.: "China, Hong Kong Special Administrative Region" should remain the same

My current incomplete code:

Energy['Country'] = Energy['Country'].str.replace("[^a-zA-Z]",'')

Any suggestions?

Upvotes: 2

Views: 71

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

You can use

Energy['Country'] = Energy['Country'].str.replace(r"\s*\([^()]*\)|\d+", "", regex=True)

See the regex demo.

If you also need to remove optional whitespace before digits, you can group the two patterns after \s*:

Energy['Country'] = Energy['Country'].str.replace(r"\s*(?:\([^()]*\)|\d+)", "", regex=True)

See this regex demo.

Details:

  • \s* - zero or more whitespaces
  • (?:\([^()]*\)|\d+) - a non-capturing group matching either
    • \([^()]*\) - a (, then zero or more chars other than ( and ) and then a )
    • | - or
    • \d+ - one or more digits

Upvotes: 1

Related Questions