user5000054
user5000054

Reputation: 33

Adding time/duration from CSV file

I am trying to add time/duration values from a CSV file that I have but I have failed so far. Here's the sample csv that I'm trying to add up.

enter image description here

Is getting this output possible?

Output: enter image description here

I have been trying to add up the datetime but I always fail:

finput = open("./Test.csv", "r")
while 1:
  line = finput.readline()
  if not line:
    break
  else:
    user = line.split(delim)[0]
    direction = line.split(delim)[1]
    duration = line.split(delim)[2]

    durationz = 0:00:00
    for k in duration:
      durationz += k

Also: is there a specific way to declare a time value?

Upvotes: 0

Views: 2459

Answers (3)

Walker
Walker

Reputation: 473

First, you might find python's built-in csv module to be helpful. Instead of having to manually split your lines and assign data, you can just do the following:

import csv
with open("test.csv", mode="r") as f:
    reader = csv.reader(f)
    for row in reader:
        user, direction, duration = row  # this is equivalent to your own variable assignment code, 
                                         # using a cool feature of python called tuple unpacking

A dictionary would be a great way to group the data by user. Here is what that might look like:

...
user_dict = {}
for row in reader:
    user, direction, duration = row
    user_dict[user] = user_dict.get(user, default={"in": "0:00:00", "out": "0:00:00"})
    user_dict[user][direction] = duration

Once that runs through the whole input csv you should have a dictionary containing an entry for every user, with every user entry containing their respective "in" and "out" values. If they are missing either an in or out value in the csv, it has been set to "0:00:00" by using the "default" parameter of the dictionary.get() statement.

We could manually parse the times, but dealing with time addition ourselves would be a huge pain. Luckily, python has a built-in module for dealing with time, called datetime.

import csv
import datetime

user_dict = {}
with open("test.csv", mode="r") as f:
    reader = csv.reader(f)
    for row in reader:
        user, direction, duration = row
        hour, minute, second = duration.split(":")

        # since the split left us with strings, and datetime needs integers, we'll need to cast everything to an int.
        hour = int(hour)
        minute = int(minute)
        second = int(second)

        # (we could have done the above more concisely using a list comprehension, which would look like this:
        # hour, minute, second = [int(time) for time in duration.split(":")]

        # to add time values we'll use the timedelta function in datetime, which takes days then seconds as its arguments. 
        # We'll just use seconds, so we'll need to convert the hours and minutes first.
        seconds = second + minute*60 + hour*60*60

        duration = datetime.timedelta(0, seconds)

        user_dict[user] = user_dict.get(user, default={"in": datetime.timedelta(0,0), "out": datetime.timedelta(0,0)})
        user_dict[user][direction] = duration

Looking at your example, we're just adding the in time to the out time (though if we wanted total time on the clock we would want to subtract the in time from the out time). We can do the addition part with the following:

output = []
for user, time_dict in user_dict.items():
    total = time_dict["in"] + time_dict["out"]
    output.append([user, time_dict["in"], time_dict["out"], total])

with open("output.csv", mode="w") as f:
    writer = csv.writer(f)
    writer.writerows(output)

That should give you close to what you want, though the output will be a single row for each user -- the data will appear horizontally instead of vertically.

All the code together:

import csv
import datetime

user_dict = {}
with open("test.csv", mode="r") as f:
    reader = csv.reader(f)
    for row in reader:
        user, direction, duration = row
        hour, minute, second = [int(time) for time in duration.split(":")]
        seconds = second + minute*60 + hour*60*60
        duration = datetime.timedelta(0, seconds)

        user_dict[user] = user_dict.get(user, default={"in": datetime.timedelta(0,0), "out": datetime.timedelta(0,0)})
        user_dict[user][direction] = duration

output = []
for user, time_dict in user_dict.items():
    total = time_dict["in"] + time_dict["out"]
    output.append([user, time_dict["in"], time_dict["out"], total])

with open("output.csv", mode="w") as f:
    writer = csv.writer(f)
    header = ["name", "time in", "time out", "total time"]
    writer.writerow(header)
    writer.writerows(output)

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1121962

Use datetime.timedelta() objects to model the durations, and pass in the 3 components as seconds, minutes and hours.

Parse your file with the csv module; no point in re-inventing the character-separated-values-parsing wheel here.

Use a dictionary to track In and Out values per user; using a collections.defaultdict() object will make it easier to add new users:

from collections import defaultdict
from datetime import timedelta
import csv

durations = defaultdict(lambda: {'In': timedelta(), 'Out': timedelta()})

with open("./Test.csv", "rb") as inf:
    reader = csv.reader(inf, delimiter=delim)
    for name, direction, duration in reader:
        hours, minutes, seconds = map(int, duration.split(':'))
        duration = timedelta(hours=hours, minutes=minutes, seconds=seconds)
        durations[name][direction] += duration

for name, directions in durations.items():
    print '{:10} In    {}'.format(name, directions['In'])
    print '           Out   {}'.format(directions['Out'])
    print '           Total {}'.format(
        directions['In'] + directions['Out'])

timedelta() objects, when converted back to strings (such as when printing or formatting with str.format() are converted to the h:mm:ss format again.

Demo:

>>> import csv
>>> from collections import defaultdict
>>> from datetime import timedelta
>>> sample = '''\
... Johnny,In,0:02:36
... Kate,Out,0:02:15
... Paul,In,0:03:57
... Chris,In,0:01:26
... Jonathan,In,0:00:37
... Kyle,In,0:06:46
... Armand,Out,0:00:22
... Ryan,In,0:00:51
... Jonathan,Out,0:12:19
... '''.splitlines()
>>> durations = defaultdict(lambda: {'In': timedelta(), 'Out': timedelta()})
>>> reader = csv.reader(sample)
>>> for name, direction, duration in reader:
...     hours, minutes, seconds = map(int, duration.split(':'))
...     duration = timedelta(hours=hours, minutes=minutes, seconds=seconds)
...     durations[name][direction] += duration
... 
>>> for name, directions in durations.items():
...     print '{:10} In    {}'.format(name, directions['In'])
...     print '           Out   {}'.format(directions['Out'])
...     print '           Total {}'.format(
...         directions['In'] + directions['Out'])
... 
Johnny     In    0:02:36
           Out   0:00:00
           Total 0:02:36
Kyle       In    0:06:46
           Out   0:00:00
           Total 0:06:46
Ryan       In    0:00:51
           Out   0:00:00
           Total 0:00:51
Chris      In    0:01:26
           Out   0:00:00
           Total 0:01:26
Paul       In    0:03:57
           Out   0:00:00
           Total 0:03:57
Jonathan   In    0:00:37
           Out   0:12:19
           Total 0:12:56
Kate       In    0:00:00
           Out   0:02:15
           Total 0:02:15
Armand     In    0:00:00
           Out   0:00:22
           Total 0:00:22

Upvotes: 2

dan
dan

Reputation: 31

There are a few things you can fix.

First, you can read every line in your file by doing for line in file.

You can't declare the variable durationz as 0:00:00. It simply doesn't work in python.

One thing you could do is make durationz 0, and parse the time by turning it into an amount of seconds. Some pseudocode:

split duration string by ":"
add 60 * 60 * hours to duration
add 60 * minutes to duration
add seconds to duration

Upvotes: 0

Related Questions