Tad
Tad

Reputation: 913

How to reformat date data in Pandas dataframe

My input dataframe is

df = pd.DataFrame({'Source':['Pre-Nov 2017', 'Pre-Nov 2017', 'Oct 19', '2019-04-01 00:00:00', '2019-06-01 00:00:00', 'Nov 17-Nov 18', 'Nov 17-Nov 18']})

I would need Target column as below

enter image description here

If I use the below code , it's not working. I'm getting the same values of Source in the Target column.

df['Target'] = pd.to_datetime(df['Source'], format= '%b %Y',errors='ignore')

Looks like pandas is considering values like '2019-04-01 00:00:00', '2019-06-01 00:00:00' as NaN

Upvotes: 1

Views: 181

Answers (1)

jezrael
jezrael

Reputation: 863751

One idea is use errors='coerce' for missing values if not matching datetimes, then convert to custom strings by Series.dt.strftime - also NaT are strings, so for replace to original use Series.mask:

df['Target'] = (pd.to_datetime(df['Source'], errors='coerce')
                  .dt.strftime('%b %y')
                  .mask(lambda x: x == 'NaT', df['Source']))
print (df)

                Source         Target
0         Pre-Nov 2017   Pre-Nov 2017
1         Pre-Nov 2017   Pre-Nov 2017
2               Oct 19         Oct 19
3  2019-04-01 00:00:00         Apr 19
4  2019-06-01 00:00:00         Jun 19
5        Nov 17-Nov 18  Nov 17-Nov 18
6        Nov 17-Nov 18  Nov 17-Nov 18

Alternative is use numpy.where:

d = pd.to_datetime(df['Source'], errors='coerce')
df['Target'] = np.where(d.isna(), df['Source'], d.dt.strftime('%b %y'))

EDIT:

but why did this did not worked

df['Target'] = pd.to_datetime(df['Source'], format= '%b %Y',errors='ignore')

If check to_datetime and use errors='ignore' it return same values of column if converting failed.

If 'ignore', then invalid parsing will return the input

Upvotes: 1

Related Questions