user1769197
user1769197

Reputation: 2213

Python: date parsing

I am trying to read a csv file and cast one of the columns as datetime. However, I do not know why the some data points i.e. 2019-01-03 12:00:00 aremissing the milliseconds, while the rest of the data contains milliseconds. This causes an error.

My question is two-fold:

  1. Since current code below generates an error, how do I get around this and parse the datetime column ?
  2. If I were to reproduce this csv file, how can I ensure all datetimes data have milliseconds ?

Sorry. Not sure why the code is not displaying properly here.

custom_date_parser = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')
df = pd.read_csv('abc.csv',parse_dates=['endTime'],date_parser=custom_date_parser)


    endTime
0   2019-01-02 09:40:22.668
1   2019-01-02 09:48:09.040
2   2019-01-02 09:54:54.209
3   2019-01-02 09:59:28.768
4   2019-01-02 10:06:33.820
5   2019-01-02 10:17:38.818
6   2019-01-02 10:30:26.999
7   2019-01-02 10:43:54.516
8   2019-01-02 11:04:26.652
9   2019-01-02 11:30:22.316
10  2019-01-02 11:59:59.751
11  2019-01-03 09:37:11.223
12  2019-01-03 09:49:06.226
13  2019-01-03 10:01:58.397
14  2019-01-03 10:15:20.918
15  2019-01-03 10:31:28.438
16  2019-01-03 10:52:26.130
17  2019-01-03 11:07:09.128
18  2019-01-03 11:22:00.907
19  2019-01-03 11:45:55.349
20  2019-01-03 12:00:00
21  2019-01-04 09:39:48.753
22  2019-01-04 09:48:06.856
23  2019-01-04 09:58:44.608
24  2019-01-04 10:10:49.498
25  2019-01-04 10:26:29.543
26  2019-01-04 10:39:36.750
27  2019-01-04 10:49:59.504
28  2019-01-04 11:00:02.138
29  2019-01-04 11:11:20.630
30  2019-01-04 11:27:59.402
31  2019-01-04 11:52:12.061
32  2019-01-04 11:59:59.879
33  2019-01-07 09:36:06.436
34  2019-01-07 09:44:07.126
35  2019-01-07 09:54:28.718
36  2019-01-07 10:05:54.130
37  2019-01-07 10:19:45.046
38  2019-01-07 10:38:15.991
39  2019-01-07 11:01:45.755
40  2019-01-07 11:17:39.586
41  2019-01-07 11:45:39.668
42  2019-01-07 12:00:00

The error msg is below:

  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 3298, in converter
    date_parser(*date_cols), errors="ignore", cache=cache_dates

  File "<ipython-input-1-26516a4dc77b>", line 34, in <lambda>
    custom_date_parser                      = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')

TypeError: strptime() argument 1 must be str, not numpy.ndarray


During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 3309, in converter
    dayfirst=dayfirst,

  File "pandas\_libs\tslibs\parsing.pyx", line 589, in pandas._libs.tslibs.parsing.try_parse_dates

  File "<ipython-input-1-26516a4dc77b>", line 34, in <lambda>
    custom_date_parser                      = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')

  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\_strptime.py", line 577, in _strptime_datetime

  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\_strptime.py", line 359, in _strptime

ValueError: time data '2019-01-03 12:00:00' does not match format '%Y-%m-%d %H:%M:%S.%f'


During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "<ipython-input-2-9b9600d4b508>", line 1, in <module>
    df_bars = pd.read_csv(f'C:\\Users\\someone\\Desktop\\CV\\2021\\data\\abc.csv',parse_dates=['endTime'],date_parser=custom_date_parser)

  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 610, in read_csv
    return _read(filepath_or_buffer, kwds)

  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 468, in _read
    return parser.read(nrows)

  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 1057, in read
    index, columns, col_dict = self._engine.read(nrows)

  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 2113, in read
    names, data = self._do_date_conversions(names, data)

  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 1846, in _do_date_conversions
    keep_date_col=self.keep_date_col,

  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 3352, in _process_date_conversion
    data_dict[colspec] = converter(data_dict[colspec])

  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 3314, in converter
    return generic_parser(date_parser, *date_cols)

  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\date_converters.py", line 100, in generic_parser
    results[i] = parse_func(*args)

  File "<ipython-input-1-26516a4dc77b>", line 34, in <lambda>
    custom_date_parser                      = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')

  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\_strptime.py", line 577, in _strptime_datetime

  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\_strptime.py", line 359, in _strptime

ValueError: time data '2019-01-03 12:00:00' does not match format '%Y-%m-%d %H:%M:%S.%f'

Upvotes: 0

Views: 256

Answers (3)

Anurag Dabas
Anurag Dabas

Reputation: 24304

you can try:

def custom_date_parser(x):
    return pd.to_datetime(x,format='%Y-%m-%d %H:%M:%S.%f',errors='coerce')

#Finally:
df = pd.read_csv('abc.csv',parse_dates=['endTime'],date_parser=custom_date_parser)

OR

Don't use date_parser at all and let pandas to manupulate the format:

df = pd.read_csv('abc.csv',parse_dates=['endTime'])

Note: PEP 8 recommends not to use a named lambda.

You can get a detailed explanation at: Is it pythonic: naming lambdas

Upvotes: 3

Nukala Raghava Aditya
Nukala Raghava Aditya

Reputation: 45

By default it adds .000 , What is the exact error you are seeing .

import pandas as pd
df = pd.DataFrame({'date': ['2016-6-10 09:40:22.668', 
                            '2016-7-1 19:45:30.532', 
                            '2013-10-12 4:5:1'],
                   'value': [2, 3, 4]})
df['date'] = pd.to_datetime(df['date'], format="%Y-%m-%d %H:%M:%S.%f")
print(df)

o/p

         date                value
0 2016-06-10 09:40:22.668      2
1 2016-07-01 19:45:30.532      3
2 2013-10-12 04:05:01.000      4

Upvotes: 0

JulianWgs
JulianWgs

Reputation: 1049

If pd.to_datatime does not help you, you could also filter the row for each format and convert them individually. See this answer for reference.

Upvotes: 0

Related Questions