Reputation: 11
I have a string date
. I know how to convert it to datetime.datetime object(when there is no missing!!!) but the problem is I have some missing values. And I couldn't do it.
let's say the input_date is the raw date variable which is string. I want to produce input_date_fmt variable which will be datetime.datetime .I am trying to run the following
DF['input_date_fmt'] = np.array([datetime.datetime.strptime(x, "%m/%d/%Y").date()
for x in DF['input_date']])
But the error is
ValueError: time data 'nan' does not match format '%m/%d/%Y'
Can anyone please help?
Upvotes: 1
Views: 3226
Reputation: 117337
If you have string values 'nan' in your dataframe:
>>> df = pd.DataFrame({'input_date':['01/01/2003', '02/29/2012', 'nan', '03/01/1995']})
>>> df
input_date
0 01/01/2003
1 02/29/2012
2 nan
3 03/01/1995
you can convert it to NaN before converting to date:
>>> df.ix[df['input_date'] == 'nan', 'input_date'] = np.NaN
>>> df
input_date
0 01/01/2003
1 02/29/2012
2 NaN
3 03/01/1995
And then you can do your conversion. But easier way would be to use vectorized operation to_datetime to convert strings to datetime:
>>> df = pd.DataFrame({'input_date':['01/01/2003', '02/29/2012', 'nan', '03/01/1995']})
>>> pd.to_datetime(df['input_date'])
0 2003-01-01 00:00:00
1 2012-02-29 00:00:00
2 NaT
3 1995-03-01 00:00:00
Upvotes: 2
Reputation: 191
You can use regexp to parse only valid dates:
DF['input_date_fmt'] = np.array([datetime.datetime.strptime(x, "%m/%d/%Y").date()
for x in DF['input_date']] if re.match('(0[1-9]|[12][0-9]|3[01])\/(0[1-9]|1[012])\/(19|20)\d\d', x))
But I'm agree with Satoru.Logic. What are you going to do with invalid values.
Upvotes: 0