Reputation: 2213
I am trying to read a csv file and cast one of the columns as datetime. However, I do not know why the some data points i.e. 2019-01-03 12:00:00
aremissing the milliseconds, while the rest of the data contains milliseconds. This causes an error.
My question is two-fold:
Sorry. Not sure why the code is not displaying properly here.
custom_date_parser = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')
df = pd.read_csv('abc.csv',parse_dates=['endTime'],date_parser=custom_date_parser)
endTime
0 2019-01-02 09:40:22.668
1 2019-01-02 09:48:09.040
2 2019-01-02 09:54:54.209
3 2019-01-02 09:59:28.768
4 2019-01-02 10:06:33.820
5 2019-01-02 10:17:38.818
6 2019-01-02 10:30:26.999
7 2019-01-02 10:43:54.516
8 2019-01-02 11:04:26.652
9 2019-01-02 11:30:22.316
10 2019-01-02 11:59:59.751
11 2019-01-03 09:37:11.223
12 2019-01-03 09:49:06.226
13 2019-01-03 10:01:58.397
14 2019-01-03 10:15:20.918
15 2019-01-03 10:31:28.438
16 2019-01-03 10:52:26.130
17 2019-01-03 11:07:09.128
18 2019-01-03 11:22:00.907
19 2019-01-03 11:45:55.349
20 2019-01-03 12:00:00
21 2019-01-04 09:39:48.753
22 2019-01-04 09:48:06.856
23 2019-01-04 09:58:44.608
24 2019-01-04 10:10:49.498
25 2019-01-04 10:26:29.543
26 2019-01-04 10:39:36.750
27 2019-01-04 10:49:59.504
28 2019-01-04 11:00:02.138
29 2019-01-04 11:11:20.630
30 2019-01-04 11:27:59.402
31 2019-01-04 11:52:12.061
32 2019-01-04 11:59:59.879
33 2019-01-07 09:36:06.436
34 2019-01-07 09:44:07.126
35 2019-01-07 09:54:28.718
36 2019-01-07 10:05:54.130
37 2019-01-07 10:19:45.046
38 2019-01-07 10:38:15.991
39 2019-01-07 11:01:45.755
40 2019-01-07 11:17:39.586
41 2019-01-07 11:45:39.668
42 2019-01-07 12:00:00
The error msg is below:
File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 3298, in converter
date_parser(*date_cols), errors="ignore", cache=cache_dates
File "<ipython-input-1-26516a4dc77b>", line 34, in <lambda>
custom_date_parser = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')
TypeError: strptime() argument 1 must be str, not numpy.ndarray
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 3309, in converter
dayfirst=dayfirst,
File "pandas\_libs\tslibs\parsing.pyx", line 589, in pandas._libs.tslibs.parsing.try_parse_dates
File "<ipython-input-1-26516a4dc77b>", line 34, in <lambda>
custom_date_parser = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')
File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\_strptime.py", line 577, in _strptime_datetime
File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\_strptime.py", line 359, in _strptime
ValueError: time data '2019-01-03 12:00:00' does not match format '%Y-%m-%d %H:%M:%S.%f'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<ipython-input-2-9b9600d4b508>", line 1, in <module>
df_bars = pd.read_csv(f'C:\\Users\\someone\\Desktop\\CV\\2021\\data\\abc.csv',parse_dates=['endTime'],date_parser=custom_date_parser)
File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 610, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 468, in _read
return parser.read(nrows)
File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 1057, in read
index, columns, col_dict = self._engine.read(nrows)
File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 2113, in read
names, data = self._do_date_conversions(names, data)
File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 1846, in _do_date_conversions
keep_date_col=self.keep_date_col,
File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 3352, in _process_date_conversion
data_dict[colspec] = converter(data_dict[colspec])
File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 3314, in converter
return generic_parser(date_parser, *date_cols)
File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\date_converters.py", line 100, in generic_parser
results[i] = parse_func(*args)
File "<ipython-input-1-26516a4dc77b>", line 34, in <lambda>
custom_date_parser = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')
File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\_strptime.py", line 577, in _strptime_datetime
File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\_strptime.py", line 359, in _strptime
ValueError: time data '2019-01-03 12:00:00' does not match format '%Y-%m-%d %H:%M:%S.%f'
Upvotes: 0
Views: 256
Reputation: 24304
you can try:
def custom_date_parser(x):
return pd.to_datetime(x,format='%Y-%m-%d %H:%M:%S.%f',errors='coerce')
#Finally:
df = pd.read_csv('abc.csv',parse_dates=['endTime'],date_parser=custom_date_parser)
OR
Don't use date_parser
at all and let pandas to manupulate the format:
df = pd.read_csv('abc.csv',parse_dates=['endTime'])
Note: PEP 8
recommends not to use a named lambda.
You can get a detailed explanation at:
Is it pythonic: naming lambdas
Upvotes: 3
Reputation: 45
By default it adds .000 , What is the exact error you are seeing .
import pandas as pd
df = pd.DataFrame({'date': ['2016-6-10 09:40:22.668',
'2016-7-1 19:45:30.532',
'2013-10-12 4:5:1'],
'value': [2, 3, 4]})
df['date'] = pd.to_datetime(df['date'], format="%Y-%m-%d %H:%M:%S.%f")
print(df)
o/p
date value
0 2016-06-10 09:40:22.668 2
1 2016-07-01 19:45:30.532 3
2 2013-10-12 04:05:01.000 4
Upvotes: 0
Reputation: 1049
If pd.to_datatime
does not help you, you could also filter the row for each format and convert them individually. See this answer for reference.
Upvotes: 0