Reputation: 1721
I have a dataframe that lists the date in several formats. For example:
date
201 July 17,2019
555 06-06-2019
295 28/3/2019
Is there a way to translate all of those dates to a datetime
?
Upvotes: 4
Views: 1489
Reputation: 8035
Edit
below for better answer.Normally for a task like this you would use pandas.Series.dt.strftime()
to convert the entire column to a specific DateTime
format, but since it seems you have different formats in different rows you should probably instead look intodatetime.strptime()
.
Traverse through that column of Date
an through if
(or try
) statement's changes the dates (pseudocode for demonstration):
for date in date_column:
if "," in date:
# strptime()
elif "-" in date:
# strptime()
elif "/" in date:
dt = datetime.strptime(date, "%d/%m/%y")
column_row = dt
Here is the strftime()
format codes, which you will need to set the second param of the function.
You may also do this by utilizing dateutil.parser.parse
(make sure to install it first via pip install python-dateutil
):
from dateutil.parser import parse
df['date'] = df.date.apply(parse)
So for example:
from dateutil.parser import parse
import pandas as pd
import numpy as np
def parse_date(date):
if date is np.nan:
return np.nan
else:
return parse(date)
df = pd.DataFrame(
{
"date": [
"July 17,2019",
"06-06-2019",
"28/3/2019",
"12/3/19",
"31-12-19",
np.nan,
],
"text": ["t1", "t2", "t3", "t4", "t5", "t6"],
}
)
df["date"] = df.date.apply(parse_date)
Which converts:
date text
0 July 17,2019 t1
1 06-06-2019 t2
2 28/3/2019 t3
3 12/3/19 t4
4 31-12-19 t5
5 NaN t6
to
date text
0 2019-07-17 t1
1 2019-06-06 t2
2 2019-03-28 t3
3 2019-12-03 t4
4 2019-12-31 t5
5 NaT t6
You can also accomplish the same as the function by using a lambda
expression as following:
df["date"] = df.date.apply(lambda date: np.nan if date is np.nan else parse(date))
Upvotes: 3
Reputation: 7224
I would create a an apply function that checks for a regex, and returns the datetime based on that regex, running through a list of if/elif commands and returning the result, something like:
def checkdate(x):
if pd.Series(x).str.contains(r'\w+ \d+,\d+')[0]:
return datetime.datetime.strptime(x, '%B %d,%Y')
#elif next str.contains, etc
else:
return np.nan
df.date.apply(checkdate)
sample out
0 2019-07-17
1 NaT
2 NaT
and then you can reset the date column to those values with df = df.date.apply(checkdate)
Upvotes: 1