Binning NumPy array by hour within a datetime field

Question

I've been struggling in Python with how to group records from a csv based on hour within a field containing dates and times. The file contains approximately 1,000,000 records. I've read the file into a Pandas dataframe and created a two dimensional NumPy array such that each record is a sublist within the NumPy array, e.g.:

#this is a NumPy array
npdata = ([somedata, '2014-07-01 08:18:21', somedata, somedata, somedata, somedata, etc], 
[somedata, '2014-07-01 10:01:40', somedata, somedata, somedata, somedata, etc], etc...])

Date & time, which is a string, is always in the same position (1) in every sublist. I've created the variable "hourlist", that is a list of 24 empty sublists. I'd like to iterate over "npdata" to populate each of the 24 sublists in "hourlist" with subsets of sublists from npdata that contain the same hour in the date & time field. E.g. all "npdata" sublists with datetime 00:xx:xx would be in one sublist of "hourlist", all 01:xx:xx in another, all 02:00:00 in another, etc from 0 through 23 hours. I've been trying to figure this out but keep hitting walls. Based on some Google searches, I believe that the datetime.strptime() class method should be used as part of the solution, but I'm not understanding how.

I really appreciate any tips/ advice.

dmargol1 · Accepted Answer

Given the format of the time string, the hour always going to be position [11:13] of the string and always will be an integer.

So simply write a function to get that integer and use it as an index like so:

def get_hour(in_array):
    return int(in_array[1][11:13])

for x in npdata:
    hourlist[get_hour(x)].append(x)

Binning NumPy array by hour within a datetime field

Answers (1)

Related Questions