Reputation: 33
I am trying to add time/duration values from a CSV file that I have but I have failed so far. Here's the sample csv that I'm trying to add up.
Is getting this output possible?
Output:
I have been trying to add up the datetime but I always fail:
finput = open("./Test.csv", "r")
while 1:
line = finput.readline()
if not line:
break
else:
user = line.split(delim)[0]
direction = line.split(delim)[1]
duration = line.split(delim)[2]
durationz = 0:00:00
for k in duration:
durationz += k
Also: is there a specific way to declare a time value?
Upvotes: 0
Views: 2459
Reputation: 473
First, you might find python's built-in csv module to be helpful. Instead of having to manually split your lines and assign data, you can just do the following:
import csv
with open("test.csv", mode="r") as f:
reader = csv.reader(f)
for row in reader:
user, direction, duration = row # this is equivalent to your own variable assignment code,
# using a cool feature of python called tuple unpacking
A dictionary would be a great way to group the data by user. Here is what that might look like:
...
user_dict = {}
for row in reader:
user, direction, duration = row
user_dict[user] = user_dict.get(user, default={"in": "0:00:00", "out": "0:00:00"})
user_dict[user][direction] = duration
Once that runs through the whole input csv you should have a dictionary containing an entry for every user, with every user entry containing their respective "in" and "out" values. If they are missing either an in or out value in the csv, it has been set to "0:00:00" by using the "default" parameter of the dictionary.get() statement.
We could manually parse the times, but dealing with time addition ourselves would be a huge pain. Luckily, python has a built-in module for dealing with time, called datetime.
import csv
import datetime
user_dict = {}
with open("test.csv", mode="r") as f:
reader = csv.reader(f)
for row in reader:
user, direction, duration = row
hour, minute, second = duration.split(":")
# since the split left us with strings, and datetime needs integers, we'll need to cast everything to an int.
hour = int(hour)
minute = int(minute)
second = int(second)
# (we could have done the above more concisely using a list comprehension, which would look like this:
# hour, minute, second = [int(time) for time in duration.split(":")]
# to add time values we'll use the timedelta function in datetime, which takes days then seconds as its arguments.
# We'll just use seconds, so we'll need to convert the hours and minutes first.
seconds = second + minute*60 + hour*60*60
duration = datetime.timedelta(0, seconds)
user_dict[user] = user_dict.get(user, default={"in": datetime.timedelta(0,0), "out": datetime.timedelta(0,0)})
user_dict[user][direction] = duration
Looking at your example, we're just adding the in time to the out time (though if we wanted total time on the clock we would want to subtract the in time from the out time). We can do the addition part with the following:
output = []
for user, time_dict in user_dict.items():
total = time_dict["in"] + time_dict["out"]
output.append([user, time_dict["in"], time_dict["out"], total])
with open("output.csv", mode="w") as f:
writer = csv.writer(f)
writer.writerows(output)
That should give you close to what you want, though the output will be a single row for each user -- the data will appear horizontally instead of vertically.
All the code together:
import csv
import datetime
user_dict = {}
with open("test.csv", mode="r") as f:
reader = csv.reader(f)
for row in reader:
user, direction, duration = row
hour, minute, second = [int(time) for time in duration.split(":")]
seconds = second + minute*60 + hour*60*60
duration = datetime.timedelta(0, seconds)
user_dict[user] = user_dict.get(user, default={"in": datetime.timedelta(0,0), "out": datetime.timedelta(0,0)})
user_dict[user][direction] = duration
output = []
for user, time_dict in user_dict.items():
total = time_dict["in"] + time_dict["out"]
output.append([user, time_dict["in"], time_dict["out"], total])
with open("output.csv", mode="w") as f:
writer = csv.writer(f)
header = ["name", "time in", "time out", "total time"]
writer.writerow(header)
writer.writerows(output)
Upvotes: 1
Reputation: 1121962
Use datetime.timedelta()
objects to model the durations, and pass in the 3 components as seconds, minutes and hours.
Parse your file with the csv
module; no point in re-inventing the character-separated-values-parsing wheel here.
Use a dictionary to track In and Out values per user; using a collections.defaultdict()
object will make it easier to add new users:
from collections import defaultdict
from datetime import timedelta
import csv
durations = defaultdict(lambda: {'In': timedelta(), 'Out': timedelta()})
with open("./Test.csv", "rb") as inf:
reader = csv.reader(inf, delimiter=delim)
for name, direction, duration in reader:
hours, minutes, seconds = map(int, duration.split(':'))
duration = timedelta(hours=hours, minutes=minutes, seconds=seconds)
durations[name][direction] += duration
for name, directions in durations.items():
print '{:10} In {}'.format(name, directions['In'])
print ' Out {}'.format(directions['Out'])
print ' Total {}'.format(
directions['In'] + directions['Out'])
timedelta()
objects, when converted back to strings (such as when printing or formatting with str.format()
are converted to the h:mm:ss
format again.
Demo:
>>> import csv
>>> from collections import defaultdict
>>> from datetime import timedelta
>>> sample = '''\
... Johnny,In,0:02:36
... Kate,Out,0:02:15
... Paul,In,0:03:57
... Chris,In,0:01:26
... Jonathan,In,0:00:37
... Kyle,In,0:06:46
... Armand,Out,0:00:22
... Ryan,In,0:00:51
... Jonathan,Out,0:12:19
... '''.splitlines()
>>> durations = defaultdict(lambda: {'In': timedelta(), 'Out': timedelta()})
>>> reader = csv.reader(sample)
>>> for name, direction, duration in reader:
... hours, minutes, seconds = map(int, duration.split(':'))
... duration = timedelta(hours=hours, minutes=minutes, seconds=seconds)
... durations[name][direction] += duration
...
>>> for name, directions in durations.items():
... print '{:10} In {}'.format(name, directions['In'])
... print ' Out {}'.format(directions['Out'])
... print ' Total {}'.format(
... directions['In'] + directions['Out'])
...
Johnny In 0:02:36
Out 0:00:00
Total 0:02:36
Kyle In 0:06:46
Out 0:00:00
Total 0:06:46
Ryan In 0:00:51
Out 0:00:00
Total 0:00:51
Chris In 0:01:26
Out 0:00:00
Total 0:01:26
Paul In 0:03:57
Out 0:00:00
Total 0:03:57
Jonathan In 0:00:37
Out 0:12:19
Total 0:12:56
Kate In 0:00:00
Out 0:02:15
Total 0:02:15
Armand In 0:00:00
Out 0:00:22
Total 0:00:22
Upvotes: 2
Reputation: 31
There are a few things you can fix.
First, you can read every line in your file by doing for line in file
.
You can't declare the variable durationz as 0:00:00. It simply doesn't work in python.
One thing you could do is make durationz 0, and parse the time by turning it into an amount of seconds. Some pseudocode:
split duration string by ":"
add 60 * 60 * hours to duration
add 60 * minutes to duration
add seconds to duration
Upvotes: 0