iLuPa
iLuPa

Reputation: 151

Parsing timestamp in Python with Pandas doesn't return a datetime64

I'm trying to parse a csv file into a dataFrame as I need to do some analysis on the timestamps. the csv file is well structured, and I can read it without a problem by using pd.read_csv:

import pandas as pd
import datetime as dt

df = pd.read_csv('trip_data.csv', low_memory=False, parse_dates=['datetime'], infer_datetime_format=True)

However, even when giving parse_dates and infer_datetime_format as arguments, I still end up with a dataFrame that doesn't parse the timestamps on my file:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8771828 entries, 0 to 8771827
Data columns (total 3 columns):
UserID                   int64
datetime                 object
amount                   float64
dtypes: float64(1), int64(1), object(1)
memory usage: 1.1+ GB

So when I try to get the minimum date, e.g.:

print(df['datetime'].min())

I get an incorrect answer, as I can see that the minimum timestamp on my df is 2018-01-01 00:08:26 and I get 2018-01-27 04:06:37 as minimum... am I missing anything, or is there any way to cast this to datetime64 in another way?

Here's a peak of my csv file:

UserID,datetime,amount
1,2018-01-01 00:21:05,5.8
1,2018-01-01 00:44:55,15.3
1,2018-01-01 00:08:26,8.3
1,2018-01-01 00:20:22,34.8
1,2018-01-01 00:09:18,16.55
1,2018-01-01 00:29:29,5.8
1,2018-01-01 00:38:08,12.35
1,2018-01-01 00:49:29,6.3

Upvotes: 1

Views: 565

Answers (2)

KillerJoe
KillerJoe

Reputation: 48

You can convert your column to datetime manually

df['datetime'] = pd.to_datetime(df['datetime'])

and then

print(df['datetime'].min())

Upvotes: 2

lweislo
lweislo

Reputation: 49

Without a peek at your data source, it's hard to give advice on how to fix this, but a good place to look might be the documentation on parsing datetime from CSV here.

A first step might be to try: (parse_dates=True, infer_datetime_format=True)

Upvotes: 0

Related Questions