TLanni
TLanni

Reputation: 340

extracting date using regex from the cell in pandas dataframe

I have a following dataframe

column 1   Description                          Extracted Data
date       January 15,2020 is important day

I want to get following result

column 1   Description                          Extracted Data
date       January 15,2020 is important day     January 15,2020

df.loc[df['column 1']=='date','Extracted Data']=df['Description'].str.extract(r'((January)|[/. ])|(\d{1,2}|[/., ]|\d{4})')

but I ma not getting desired result.Instead, i ma getting dataframe with all NaN values. How can I fix this?

Upvotes: 0

Views: 758

Answers (3)

Rahul Verma
Rahul Verma

Reputation: 3176

import dateutil.parser as dparser
import pandas as pd

df = pd.DataFrame({'column 1': ['date'], 'Description': ['January 15,2020 is important day']})
df['Extracted Data'] = df['Description'].apply(lambda x: dparser.parse(x,fuzzy=True).strftime('%B %d %Y'))
print(df)
  column 1                       Description   Extracted Data
0     date  January 15,2020 is important day  January 15 2020

Upvotes: 0

Strange
Strange

Reputation: 1540

This works:(Oneliner)

df['Extracted data'] = [re.match('[A-Za-z]+ \d{2},\d{4}',x)[0] for x in df['Description']]

output:

  column1                              Desc   Extracted data
0    date  January 15,2020 is important day  January 15,2020

Regex Link: https://regex101.com/r/ICDJCp/1

Upvotes: 1

Zaraki Kenpachi
Zaraki Kenpachi

Reputation: 5730

Use multi dot .* and digits.

import pandas as pd

df = pd.DataFrame({'column 1': ['date'], 'Description': ['January 15,2020 is important day']})
df['Extracted Data'] = df['Description'].str.extract(r'(.*,\d{4})')

Output:

  column 1                       Description   Extracted Data
0     date  January 15,2020 is important day  January 15,2020

Upvotes: 1

Related Questions