Reputation: 105
I have a date string and want to convert it to the date type:
I have tried to use pd.to_datetime
with the format that I want but it is returning the time without the conversion.
df = pd.DataFrame({
'date': ['2010-12-30 23:57:10+00:00', '2010-12-30 23:52:41+00:00','2010-12-30 23:43:04+00:00','2010-12-30 23:37:30+00:00','2010-12-30 23:31:39+00:00'],
'text' : ['El odontólogo Barreda, a un paso de quedar en …','Defederico es el nuevo refuerzo de Independien..','Israel: ex presidente Katzav declarado culpabl…'
, 'FMI estima que la recuperación asimétrica de l…','¿Quién fue el campeón argentino del año? Votá …']
})
df["new date"] =pd.to_datetime(df['date'], format="%Y-%m-%d")
That is the output that returns
2010-12-30 23:57:10+00:00
and I need to eliminate
23:57:10+00:00
.
Upvotes: 1
Views: 3293
Reputation: 18315
Well it's a datetime object, so it needs to keep the time information. However, there's a Period datatype that might fit here: it represents a span of time instead of a stamp:
df["new date"] = pd.to_datetime(df["date"]).dt.to_period(freq="D")
which converts to Daily periods to get
>>> df["new date"]
0 2010-12-30
1 2010-12-30
2 2010-12-30
3 2010-12-30
4 2010-12-30
Name: new date, dtype: period[D]
Noting that these are not strings; one can therefore continue to perform .dt
based operations.
If you do need datetime type, though, you can .normalize()
the timestamps to signal the time component is immaterial and they are all set to midnight:
>>> df["new date"] = pd.to_datetime(df["date"]).dt.normalize()
>>> df["new date"]
0 2010-12-30 00:00:00+00:00
1 2010-12-30 00:00:00+00:00
2 2010-12-30 00:00:00+00:00
3 2010-12-30 00:00:00+00:00
4 2010-12-30 00:00:00+00:00
Name: new date, dtype: datetime64[ns, UTC]
Noting that after normalization, the display does not normally show that all-zero time information if the original datetime stamps do not have timezone information attached, i.e., the part after "+"; in your case, they do have it, so we see the zeros in the output as well. If you want to get rid of that in such cases, you can chain .dt.tz_convert(tz=None)
to get rid of the timezone information and therefore the all-zeros in the output. Still, the output is of type datetime.
Lastly, if it is all about display purposes, then we can use .strftime
to shape them into a desired format:
>>> df["new date"] = pd.to_datetime(df["date"]).dt.strftime("%Y-%m-%d")
>>> df["new date"]
0 2010-12-30
1 2010-12-30
2 2010-12-30
3 2010-12-30
4 2010-12-30
Name: new date, dtype: object
As you see, the datatype is "object", i.e., string here, which would prevent datetime-based actions, e.g., df["new date"].dt.month
would no longer work unlike the first two alternatives.
Upvotes: 2
Reputation: 120539
To keep a DatetimeIndex and its dt
accessor, you can use dt.normalize()
to reset the time part then dt.tz_convert
to remove the timezone information:
df['new date'] = pd.to_datetime(df["date"]).dt.normalize().dt.tz_convert(None)
Output
>>> df['new date']
0 2010-12-30
1 2010-12-30
2 2010-12-30
3 2010-12-30
4 2010-12-30
Name: new date, dtype: datetime64[ns]
Upvotes: 0