T.mum
T.mum

Reputation: 85

Regex in pandas dataframe

I have a dataframe like this a column like this

COL1      
RED[10%(INC)]
RED[12%(INC)]

and I want create col2 as this

COL2
10
12

Could cou help me to find the good regex? I tried this:

RED\[(\d+\.\d+) %INC\]

but it doesn't walk.

Upvotes: 4

Views: 912

Answers (3)

Simon
Simon

Reputation: 115

df['COL2']= np.where(df['COL1'].str.extract(r'RED\[(d+)')

Upvotes: 0

Ruthger Righart
Ruthger Righart

Reputation: 4921

This solution uses re.findall:

Modules and data:

import pandas as pd
df = pd.DataFrame({'COL1':['RED[10%(INC)','RED[12%(INC)']})

Solution:

df['COL2'] = df['COL1'].apply(lambda x: re.findall('[0-9]+', x))
df['COL2'] = pd.DataFrame(df['COL2'].tolist())

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626825

If you want to use your regex and only extract numbers in the specified context, you can use

df['COL2'] = df['COL1'].str.extract(r'RED\[(\d+(?:\.\d+)?)%\[INC]]', expand=False)

See the regex demo.

Details

  • RED\[ - a RED[ string
  • (\d+(?:\.\d+)?) - Capturing group 1: one or more digits followed with an optional sequence of a dot and one or more digits
  • %\[INC]] - a %[INC]] literal string.

You could also explore other options:

  • Extracting the number followed with a percentage sign: df['COL1'].str.extract(r'(\d+(?:\.\d+)?)%', expand=False)
  • Splitting with [, getting the second item and removing % from it: df['COL1'].str.split("[").str[1].str.replace("%", "")

Upvotes: 3

Related Questions