python getting histogram bins for datetime objects

Question

I have two lists.

The list times is a list of datetimes from 2018-04-10 00:00 to 2018-04-10 23:59.
For each item in times I have a corresponding label of 0 or 1 recorded in the list labels.

My goal is to get the mean label value (between 0 and 1) for every minute interval.

times = [Timestamp('2018-04-10 00:00:00.118000'),
 Timestamp('2018-04-10 00:00:00.547000'),
 Timestamp('2018-04-10 00:00:00.569000'),
 Timestamp('2018-04-10 00:00:00.690000'),
.
.
.
Timestamp('2018-04-10 23:59:59.999000') ]

labels = [0,1,1,0,1,0,....1]

where len(times) == len(labels)

For every minute interval between 2018-04-10 00:00 and 2018-04-10 23:59, the min and max times in the list respectively, I am trying to get two lists:

1) The start time of the minute interval.

2) The mean average label value of all the datetimes in that interval.

In particular I am having trouble with (2).

Note: the times list is not necessarily chronologically ordered

Noppu · Accepted Answer

Firstly, I begin with how I generated the data as above format

from datetime import datetime
size = int(1e6)

timestamp_a_day = np.linspace(datetime.now().timestamp(), datetime.now().timestamp()+24*60*60, size)
dummy_sec = np.random.rand(size)

timestamp_series = pd.Series(timestamp_a_day + dummy_sec)\
.sort_values().reset_index(drop=True)\
.apply(lambda x: datetime.fromtimestamp(x))

data = pd.DataFrame(timestamp_series, columns=['timestamp'])
data['label'] = np.random.randint(0, 2, size)

Let's solve this problem !!! (I hope I understand your question precisely hahaha)

1) data['start_interval'] = data['timestamp'].dt.floor('s')
2) data.groupby('start_interval')['label'].mean()

python getting histogram bins for datetime objects

Answers (2)

Related Questions