Reputation: 340
I have a following dataframe
column 1 Description Extracted Data
date January 15,2020 is important day
I want to get following result
column 1 Description Extracted Data
date January 15,2020 is important day January 15,2020
df.loc[df['column 1']=='date','Extracted Data']=df['Description'].str.extract(r'((January)|[/. ])|(\d{1,2}|[/., ]|\d{4})')
but I ma not getting desired result.Instead, i ma getting dataframe with all NaN values. How can I fix this?
Upvotes: 0
Views: 758
Reputation: 3176
import dateutil.parser as dparser
import pandas as pd
df = pd.DataFrame({'column 1': ['date'], 'Description': ['January 15,2020 is important day']})
df['Extracted Data'] = df['Description'].apply(lambda x: dparser.parse(x,fuzzy=True).strftime('%B %d %Y'))
print(df)
column 1 Description Extracted Data
0 date January 15,2020 is important day January 15 2020
Upvotes: 0
Reputation: 1540
This works:(Oneliner)
df['Extracted data'] = [re.match('[A-Za-z]+ \d{2},\d{4}',x)[0] for x in df['Description']]
output:
column1 Desc Extracted data
0 date January 15,2020 is important day January 15,2020
Regex Link: https://regex101.com/r/ICDJCp/1
Upvotes: 1
Reputation: 5730
Use multi dot .*
and digits.
import pandas as pd
df = pd.DataFrame({'column 1': ['date'], 'Description': ['January 15,2020 is important day']})
df['Extracted Data'] = df['Description'].str.extract(r'(.*,\d{4})')
Output:
column 1 Description Extracted Data
0 date January 15,2020 is important day January 15,2020
Upvotes: 1