Reputation: 85
I have a dataframe like this a column like this
COL1
RED[10%(INC)]
RED[12%(INC)]
and I want create col2
as this
COL2
10
12
Could cou help me to find the good regex? I tried this:
RED\[(\d+\.\d+) %INC\]
but it doesn't walk.
Upvotes: 4
Views: 912
Reputation: 4921
This solution uses re.findall
:
Modules and data:
import pandas as pd
df = pd.DataFrame({'COL1':['RED[10%(INC)','RED[12%(INC)']})
Solution:
df['COL2'] = df['COL1'].apply(lambda x: re.findall('[0-9]+', x))
df['COL2'] = pd.DataFrame(df['COL2'].tolist())
Upvotes: 0
Reputation: 626825
If you want to use your regex and only extract numbers in the specified context, you can use
df['COL2'] = df['COL1'].str.extract(r'RED\[(\d+(?:\.\d+)?)%\[INC]]', expand=False)
See the regex demo.
Details
RED\[
- a RED[
string(\d+(?:\.\d+)?)
- Capturing group 1: one or more digits followed with an optional sequence of a dot and one or more digits%\[INC]]
- a %[INC]]
literal string.You could also explore other options:
df['COL1'].str.extract(r'(\d+(?:\.\d+)?)%', expand=False)
[
, getting the second item and removing %
from it: df['COL1'].str.split("[").str[1].str.replace("%", "")
Upvotes: 3