Reputation: 103
I have a pandas dataframe in which the date information is a string with the month and year:
date = ["JUN 17", "JULY 17", "AUG 18", "NOV 19"]
Note that the month is usually written as the 3 digit abbreviation, but is sometimes written as the full month for June and July.
I would like to convert this into a datetime format which assumes each date is on the first of the month:
date = [06-01-2017, 07-01-2017, 08-01-2018, 11-01-2019]
Edit to provide more information: Two main issues I wasn't sure how to handle:
df['date'] = pd.to_datetime(dict(year = df['Record Month'].astype(str).str[-2:], month = df['Record Month'].astype(str).str[0:3], day=1))
This has the error "Unable to parse string "JUN" at position 0
Upvotes: 1
Views: 881
Reputation: 4855
You were close with using pandas.to_datetime()
. Instead of using a dictionary though, you could just reformat the date strings to a more standard format. If you convert each date string into MMMYY format (pretty similar to what you were doing) you can pass the strftime format "%b%y"
to to_datetime()
and it will convert the strings into dates.
import pandas as pd
date = ["JUN 17", "JULY 17", "AUG 18", "NOV 19"]
df = pd.DataFrame(date, columns=["Record Month"])
df['date'] = pd.to_datetime(df["Record Month"].str[:3] + df["Record Month"].str[-2:], format='%b%y')
print(df)
Produces that following result:
Record Date date
0 JUN 17 2017-06-01
1 JULY 17 2017-07-01
2 AUG 18 2018-08-01
3 NOV 19 2019-11-01
Upvotes: 0
Reputation: 5183
If you are not sure of the many spellings that can show up then a dictionary mapping would not work. Perhaps your best chance is to split and slice so you normalize into year and month columns and then build the date.
If date
is a list as in your example.
date = [d.split() for d in date]
df = pd.DataFrame([m[:3].lower, '20' + y] for m, y in date],
# df = pd.DataFrame([[s.split()[0][:3].lower, '20' + s.split()[1]] for s in date],
columns=['month', 'year'])
Then pass a mapper to series.replace
as in
df.month = df.month.replace({'jan': 1, 'feb': 2 ...})
Then parse the dates from its components
# first cap the date to the first day of the month
df['day'] = 1
df = pd.to_datetime(df)
Upvotes: 1