Reputation: 81
Background:
I have a pandas DataFrame containing a tweet
and weather
column. The DataFrame columns are current as follows -
Objective:
I am trying to extract the datestamp from the weather
column (e.g the datestamp
for row index 0 is '(2020-07-14)') and save it in a new date
column, with the purpose of filtering on it, e.g filtering to the latest date.
I know how to change a column string value to a datestamp
, if it were something like '20140512'. However I have no idea how to identify a datestamp
in the current format and extract that into a new column.
Any advice would be greatly appreciated
Upvotes: 3
Views: 1226
Reputation: 4618
you could do something like this, assuming it's in the weather column and always has the same formatting:
df['date'] = pd.to_datetime(df['weather'].str.extract('\((\d{4}-\d{2}-\d{2})\)')[0])
or
import re
df['date'] = pd.to_datetime(df['weather'].apply(lambda x: re.search('\((\d{4}-\d{2}-\d{2})\)', x).group(1)))
Upvotes: 1