Reputation: 149
I've tried to apply the 'replace' method on this list in a dataframe:
df=pd.DataFrame({'DC': ['DOUBLE CHANCE\n1 OR X\n2.50\nX OR 2\n1.12\n1 OR 2\n1.20']})
Desired result: 2.50\n1.12\n1.20
Looking for suggestion to make the cleanup work either 'replace' method or 'regex'
import pandas as pd df=pd.DataFrame({'DC': ['DOUBLE CHANCE\n1 OR X\n2.50\nX OR 2\n1.12\n1 OR 2\n1.20']}) df = df['DC']['Double_Chance'].str.replace(r'([^\d\.\n])','').str.replace(r'1\n','').str.replace(r'2\n','').str.replace(r'12\n','').str.strip() df 0 2.50\n1.111.20 Name: Score, dtype: object
Upvotes: 1
Views: 43
Reputation: 627082
You can use Series.str.replace
:
df['Double_Chance'] = df['DC'].str.replace(r'(?m)^(?!\d+\.\d+$).*\n*', '')
Or, you may use Series.str.findall
:
df['Double_Chance'] = df['DC'].str.findall(r'(?m)^\d+\.\d+$').str.join("\n")
Both produce 2.50\n1.12\n1.20
.
See the regex demo. Details:
(?m)
- the re.M
option that makes ^
match start of each line^
- start of a line(?!\d+\.\d+$)
- fail the match if the line is a float number.*
- zero or more chars other than line break chars, as many as possible\n*
- zero or more line feed chars.Upvotes: 1