Reputation: 913
My input dataframe is
df = pd.DataFrame({'Source':['Pre-Nov 2017', 'Pre-Nov 2017', 'Oct 19', '2019-04-01 00:00:00', '2019-06-01 00:00:00', 'Nov 17-Nov 18', 'Nov 17-Nov 18']})
I would need Target column as below
If I use the below code , it's not working. I'm getting the same values of Source in the Target column.
df['Target'] = pd.to_datetime(df['Source'], format= '%b %Y',errors='ignore')
Looks like pandas is considering values like '2019-04-01 00:00:00', '2019-06-01 00:00:00' as NaN
Upvotes: 1
Views: 181
Reputation: 863751
One idea is use errors='coerce'
for missing values if not matching datetimes, then convert to custom strings by Series.dt.strftime
- also NaT
are strings, so for replace to original use Series.mask
:
df['Target'] = (pd.to_datetime(df['Source'], errors='coerce')
.dt.strftime('%b %y')
.mask(lambda x: x == 'NaT', df['Source']))
print (df)
Source Target
0 Pre-Nov 2017 Pre-Nov 2017
1 Pre-Nov 2017 Pre-Nov 2017
2 Oct 19 Oct 19
3 2019-04-01 00:00:00 Apr 19
4 2019-06-01 00:00:00 Jun 19
5 Nov 17-Nov 18 Nov 17-Nov 18
6 Nov 17-Nov 18 Nov 17-Nov 18
Alternative is use numpy.where
:
d = pd.to_datetime(df['Source'], errors='coerce')
df['Target'] = np.where(d.isna(), df['Source'], d.dt.strftime('%b %y'))
EDIT:
but why did this did not worked
df['Target'] = pd.to_datetime(df['Source'], format= '%b %Y',errors='ignore')
If check to_datetime
and use errors='ignore'
it return same values of column if converting failed.
If 'ignore', then invalid parsing will return the input
Upvotes: 1