diogenes
diogenes

Reputation: 2109

Sum TimeField hours/minutes with Pandas

I am trying to use Pandas to sum the time (hours, minutes) of a series. The data comes from a TimeField

class PhoneRecord ( models.Model ):
    et = models.TimeField ( null=True, blank=True )

In python I get the record and convert to a dataframe.

phone = PhoneRecord.objects.all()

df = read_frame ( phone )

df.et = df.et.fillna ( '00:00:00' )    # some records are blank

df [ "time" ] = pd.to_datetime(df.et, format = '%H:%M:%S', errors = 'coerce')

this gives me the following output.

0      00:00:35
1      00:00:29
2      00:00:00
3      00:00:00
4      00:00:37
       ......

When I try to sum

df.time.sum ()

I get errors like: unsupported operand type(s) for +: 'datetime.time' and 'datetime.time'

What do I need to do to be able to sum and average the data.

Thank you.

Upvotes: 1

Views: 882

Answers (2)

bb8
bb8

Reputation: 190

You just need to run a custom 1-liner here to combine time objects into timedelta objects which can then be summed together. (see the "print" line)

from datetime import datetime, timedelta
import pandas as pd

phone = PhoneRecord.objects.all()
df = pd.DataFrame(list([i.__dict__ for i in phone])) # create pd.df from model query
df.et = df.et.fillna('00:00:00')  # some records are blank
print(df.et)
print("SUM:", sum([datetime.combine(datetime.min, time) - datetime.min for time in df.et.tolist()], timedelta()))

You should get something like this:

0    00:00:20
1    00:00:20
2    00:00:50
3    00:00:30
4    00:00:20

SUM: 0:02:20

I had to change things a bit to get them to work on my end so hopefully, it is the same with you and your version of Pandas and Django. Hope this helps!

Upvotes: 2

bubble
bubble

Reputation: 1672

All computations, such as getting averages, counting etc. should be performed (if possible) using database engine. I don't know about underlying problem, but using pandas on the server-side to get average values is definitely overkill. You need to look at aggregation facility of Django.

Also, you probably need to restructure the model. If you need to store duration of a phone conversation, you can use FloatField instead, e.g.

class PhoneRecord(models.Model):
    duration = models.FloatField(blank=True, default=0.0, help_text=_('duration in seconds')) 
    # other fields... 
    # also, you can set up  the duration field with `editable=False`, and 
    # calculate its value each time the record is created

In this case you can use Avg:

from django.db.models import Avg
PhoneRecords.objects.all().aggregate(Avg('duration'))

and get something like this

{'duration_avg': 12.3}

Upvotes: 1

Related Questions