Split value into bins based on time

Question

I'm working in Python modifying New York City subway turnstile data to turn into a visualization of the entrance/exits for each station.

So far I have a list of entrance/exit counts based on start (03-24-15) and end (03-27-15)dates:

{
'endTime': '03-25-14T21:40:30',
'entriesDuringPeriod': 158,
'exitsDuringPeriod': 597,
'startTime': '03-25-14T17:03:23'
},
{
'endTime': '03-26-14T01:00:00',
'entriesDuringPeriod': 29,
'exitsDuringPeriod': 235,
'startTime': '03-25-14T21:00:00'
},

The problem I have is that the different time periods are not standardize and sometimes overlap. I'd like to be able to go through and create another list that normalizes these numbers into one hour increments.

I'm not very familiar with Python time processing, and I was wondering if someone could provide some information about how to get started taking strings, converting them into date objects, and dividing up values based on time.

The final visualization will be visualized using d3.js if that matters.

Travis D. · Accepted Answer

Getting the strings into datetime objects isn't too bad:

from datetime import datetime
from time import time, mktime, strptime

def get_datetime( instr ):
  return datetime.fromtimestamp(mktime(strptime(instr, '%m-%d-%yT%H:%M:%S')))

# eg: get_datetime( '03-25-14T21:20:30' ) => datetime.datetime(2014, 3, 25, 21, 20, 30)

Binning / normalizing the data largely depends on how you want to handle the overlapping durations... Eg. Do you want to assume that people arrived & exited in a linear fashion, so that if the timestamps were for an hour and a half, 66% would go into the full hour and 33% into the other partial hour?

EDIT: Based on OP's comment, here's totally functional code:

from datetime import timedelta
from collections import defaultdict

def add_datum( dd, v ):
    end_dt = get_datetime(v['endTime'])
    start_dt = get_datetime(v['startTime'])
    total_duration = end_dt - start_dt 

    hour_start = datetime( year = start_dt.year, 
                           month = start_dt.month, 
                           day = start_dt.day, 
                           hour = start_dt.hour )
    hour_end = hour_start + timedelta( hours = 1 )

    while hour_start < end_dt:
        dt = min([hour_end, end_dt]) - max([ hour_start, start_dt ])
        fraction = 1.0 * dt.total_seconds() / total_duration.total_seconds()
        dd[ hour_start ]['hour'] = hour_start
        dd[ hour_start ]['entries'] += v['entriesDuringPeriod'] * fraction
        dd[ hour_start ]['exits'] += v['exitsDuringPeriod'] * fraction # exits

        hour_start = hour_end
        hour_end = hour_end + timedelta( hours = 1 )
    return dd


dd = defaultdict(lambda: {'entries':0,'exits':0})
all_data = [{ 'endTime': '03-25-14T21:40:30',
              'entriesDuringPeriod': 158,
              'exitsDuringPeriod': 597,
              'startTime': '03-25-14T17:03:23' },
            { 'endTime': '03-26-14T01:00:00',
              'entriesDuringPeriod': 29,
              'exitsDuringPeriod': 235,
              'startTime': '03-25-14T21:00:00' }]

[ add_datum( dd, i ) for i in all_data ]
res = dd.values()
res.sort( key = lambda i: i['hour'] )

print res
# [{'entries': 32.28038732182594,
#   'exits': 121.97083057677271,
#   'hour': datetime.datetime(2014, 3, 25, 17, 0)},
#  {'entries': 34.209418415829674,
#   'exits': 129.25963793829314,
#   'hour': datetime.datetime(2014, 3, 25, 18, 0)},
#  {'entries': 34.209418415829674,
#   'exits': 129.25963793829314,
#   'hour': datetime.datetime(2014, 3, 25, 19, 0)},
#  {'entries': 34.209418415829674,
#   'exits': 129.25963793829314,
#   'hour': datetime.datetime(2014, 3, 25, 20, 0)},
#  {'entries': 30.34135743068503,
#   'exits': 146.00025560834786,
#   'hour': datetime.datetime(2014, 3, 25, 21, 0)},
#  {'entries': 7.25,
#   'exits': 58.75,
#   'hour': datetime.datetime(2014, 3, 25, 22, 0)},
#  {'entries': 7.25,
#   'exits': 58.75,
#   'hour': datetime.datetime(2014, 3, 25, 23, 0)},
#  {'entries': 7.25,
#   'exits': 58.75,
#   'hour': datetime.datetime(2014, 3, 26, 0, 0)}]

Split value into bins based on time

Answers (2)

Related Questions