Reputation: 61
So I have the Edition Column which contains data in unevenly pattern, as some have ',' followed by the date and some have ',-' pattern.
df.head()
17 Paperback,– 1 Nov 2016
18 Mass Market Paperback,– 1 Jan 1991
19 Paperback,– 2016
20 Hardcover,– 24 Nov 2018
21 Paperback,– Import, 4 Oct 2018
How can I extract the date to a separate column. I tried using str.split() but can't find specific pattern to extract.Is there any method I could do it?
Upvotes: 1
Views: 86
Reputation: 1413
Try using dateutil
from dateutil.parser import parse
df['Dt']=[parse(i, fuzzy_with_tokens=True)[0] for i in df['column']]
Upvotes: 2
Reputation: 5601
obj = df['Edition']
obj.str.split('((?:\d+\s+\w+\s+)?\d{4}$)', expand=True)
or
obj.str.split('[,–]+').str[0]
obj.str.split('[,–]+').str[-1] # date
Upvotes: 3