Reputation: 181
I have a dataset coming from a PDF file that contains a column for date in the following format: JAN2021, and I want to convert it into a datetime that only displays Month and Year, but as JAN-2021. I tried using the statement below but it did not work, is there any way I can do this with altering the data ( meaning without adding a day for that month as an alteration).
df2['MONTH1'] = pd.to_datetime(df2['MONTH1'], format="%mmm%YY")
any feedback/tip will be greatly appreciated.
Upvotes: 0
Views: 1973
Reputation: 23207
You can use pd.to_datetime
to convert the month-year string (e.g. JAN2021
) to datetime format. Then, use dt.strftime()
to format the datetime objects to the required layout in mmm-YYYY
(e.g Jan-2021
), as follows:
(Assuming your dataframe is called df
and column name is Col1
):
df['Col_new'] = pd.to_datetime(df['Col1'], format='%b%Y').dt.strftime('%b-%Y')
Result:
print(df)
Col1 Col_new
0 JAN2021 Jan-2021
1 FEB2021 Feb-2021
2 MAR2021 Mar-2021
3 APR2021 Apr-2021
You have used an invalid format string "%mmm%YY"
in your trial code. For a complete list of valid format strings and their meaning, with examples, you can refer to the Python official document here.
Upvotes: 2
Reputation: 2670
Converting will add a day to the format.
https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
pd.to_datetime looks for standard designations of the datetime component in the column names, including:
required: year, month, day
optional: hour, minute, second, millisecond, microsecond, nanosecond
Hence when converting you get a day (time is optional):
pd.to_datetime('JAN2021', format='%b%Y')
>> Timestamp('2021-01-01 00:00:00')
Upvotes: 1