user10724844
user10724844

Reputation:

Regular expression and removing parentheses

I have a dataframe:

name
(John)
(Lily)
(Eddy)
Wang
Lisa

The dataframe is not correctly formatted and I need to remove the parentheses. The returned df should be:

name
John
Lily
Eddy
Wang 
Lisa

My code is:

merge_df['name'] = merge_df['name'].replace('()','')

But the returned df doesn't give me the result I wanted. Anyone knows how to fix this piece of code?

Upvotes: 0

Views: 727

Answers (2)

jadore801120
jadore801120

Reputation: 77

According to the official document, the first parameter of the replace function is a string to be replaced. Therefore, the some_str.replace('()','') means replace all the () in your string, which is not found, and hence your replace function returned the same string back.

There are three ways to deal with it.

  1. Using multiple replace functions.

    Since you can only replace on kind of substring at one time, we can use it twice to fulfill our purpose.

    your_str = your_str.replace('(','').replace(')','')
    
  2. Using regular expression library re.sub().

    The re.sub() (doc) function is much more powerful to specify different substring to be replaced in one call.

    For me, I prefer this solution, since it is more flexible and powerful.

    import re
    your_str = re.sub(r'[\)\(]', '', your_str)
    
  3. Using str.strip() (doc)

    The str.strip() function will only remove the characters on the ends of your string, and you can also specify multiple character you want to remove. Therefore, in this case, it is useful.

    your_str = your_str.strip('()')
    

Upvotes: 1

Kota Mori
Kota Mori

Reputation: 6750

.replace is by default looks for exact matching. You can specify explicitly that you want to use regular expression as below.

merge_df['name'].replace(regex="[()]", value="")

Upvotes: 1

Related Questions