Parsing timestamp using csv module and datetime module

I'm having some trouble with the datetime module in Python. I have this data from a csv file:

user_id,timestamp
563,0:00:21
671,0:00:26
780,0:00:28

This is my code:

import csv
from datetime import datetime

path = "/home/haldrik/dev/python/data/dataset.csv"
file = open(path, newline='')

reader = csv.reader(file, delimiter=',')

header = next(reader) # Ignore first row.

data = []
for row in reader:
    # row = [user_id, timestamp]
    user_id = row[0]
    timestamp = datetime.strptime(row[1], '%H:%M:%S').time()
    
    data.append([user_id, timestamp])

That code throws this error:

Traceback (most recent call last):
  File "/home/haldrik/dev/python/instances_web_site.py", line 15, in <module>
    date = datetime.strptime(row[1], '%H:%M:%S').time()
  File "/usr/lib/python3.8/_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "/usr/lib/python3.8/_strptime.py", line 349, in _strptime
    raise ValueError("time data %r does not match format %r" %
ValueError: time data '' does not match format '%H:%M:%S'

I can't find where the error is. I can see that the data format fits the time format specified.

Decanting the cvs import step, I can ensure that it works, see this snippet of code (not included into the above code):

data_import = [row for row in reader]
print(data_import[0])

It outputs this:

['563','0:00:21']

Upvotes: 0

Views: 353

Answers (1)

Trenton McKinney
Trenton McKinney

Reputation: 62403

  • You have an issue with one or more of the values in the timestamp column, where a row looks like 440, and will result in time data '' does not match format '%H:%M:%S'
  • Wrap date = datetime.strptime(row[1], '%H:%M:%S').time() in a try-except block.

test.csv

user_id,timestamp
563,0:00:21
671,0:00:26
780,0:00:28
440,

Code

import csv
from datetime import datetime

path = "test.csv"
file = open(path, newline='')

reader = csv.reader(file, delimiter=',')

header = next(reader) # Ignore first row.

data = []
for row in reader:
    # row = [user_id, timestamp]
    user_id = row[0]
    try:
        timestamp = datetime.strptime(row[1], '%H:%M:%S').time()
    except ValueError as e:
        timestamp = row[1]
#         continue  # use this if you do not want the row added to data, comment it out otherwise
    
    data.append([user_id, timestamp])


print(data)
[out]:
[['563', datetime.time(0, 0, 21)], ['671', datetime.time(0, 0, 26)], ['780', datetime.time(0, 0, 28)], ['440', '']]

Upvotes: 2

Related Questions