Reputation: 11
I tried using the following code:
df1['company_etrim']=df1['company_trim'].str.replace(r'[0-9()%]', "").str.join('')
and got a result as the above image which excluded 3 from 3M CO and 3 from SBIO INC. But I want to include them.
my expected result should return any numbers between or starting or ending, but it shouldn't return any numbers in between () and shouldn't return().
Upvotes: 0
Views: 102
Reputation: 626794
If your data is all like that in your column, you do not need a regex, you may rsplit
the column with a space once and get the first part:
import pandas as pd
df1 = pd.DataFrame({'company_etrim':['3M CO (95%);', '3SBIO INC (96%);']})
>>> df1['company_etrim'].str.rsplit(' ', n=1).str[0]
0 3M CO
1 3SBIO INC
Name: company_etrim, dtype: object
If you want to remove percentages inside parentheses followed with ;
you can use a regex approach:
>>> df1['company_etrim'].str.replace(r'\s*\(\d+%\);', '', regex=True)
0 3M CO
1 3SBIO INC
Name: company_etrim, dtype: object
The \s*\(\d+%\);
regex matches
\s*
- zero or more whitespaces\(
- a (
char\d+
- one or more digits%\);
- a %);
string.Upvotes: 1