Achillies
Achillies

Reputation: 11

String Split using pandas

enter image description here

I tried using the following code:

df1['company_etrim']=df1['company_trim'].str.replace(r'[0-9()%]', "").str.join('')

and got a result as the above image which excluded 3 from 3M CO and 3 from SBIO INC. But I want to include them.

my expected result should return any numbers between or starting or ending, but it shouldn't return any numbers in between () and shouldn't return().

Upvotes: 0

Views: 102

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626794

If your data is all like that in your column, you do not need a regex, you may rsplit the column with a space once and get the first part:

import pandas as pd
df1 = pd.DataFrame({'company_etrim':['3M CO (95%);', '3SBIO INC (96%);']})
>>> df1['company_etrim'].str.rsplit(' ', n=1).str[0]
0        3M CO
1    3SBIO INC
Name: company_etrim, dtype: object

If you want to remove percentages inside parentheses followed with ; you can use a regex approach:

>>> df1['company_etrim'].str.replace(r'\s*\(\d+%\);', '', regex=True)
0        3M CO
1    3SBIO INC
Name: company_etrim, dtype: object

The \s*\(\d+%\); regex matches

  • \s* - zero or more whitespaces
  • \( - a ( char
  • \d+ - one or more digits
  • %\); - a %); string.

Upvotes: 1

Muhammed Jaseem
Muhammed Jaseem

Reputation: 830

Try this:

str.replace(\([0-9]+\%\), "")

Upvotes: 0

Related Questions