Reputation: 2227
I have two lists.
The list times
is a list of datetimes
from 2018-04-10 00:00
to
2018-04-10 23:59
.
For each item in times
I have a corresponding label of 0
or 1
recorded in the list labels
.
My goal is to get the mean label value (between 0
and 1
) for every minute interval.
times = [Timestamp('2018-04-10 00:00:00.118000'),
Timestamp('2018-04-10 00:00:00.547000'),
Timestamp('2018-04-10 00:00:00.569000'),
Timestamp('2018-04-10 00:00:00.690000'),
.
.
.
Timestamp('2018-04-10 23:59:59.999000') ]
labels = [0,1,1,0,1,0,....1]
where len(times) == len(labels)
For every minute interval between 2018-04-10 00:00
and 2018-04-10 23:59
, the min and max times in the list respectively, I am trying to get two lists:
1) The start time of the minute interval.
2) The mean average label value of all the datetimes in that interval.
In particular I am having trouble with (2).
Note: the times
list is not necessarily chronologically ordered
Upvotes: 0
Views: 430
Reputation: 23773
times
and labels
then sort;Upvotes: 1
Reputation: 59
Firstly, I begin with how I generated the data as above format
from datetime import datetime
size = int(1e6)
timestamp_a_day = np.linspace(datetime.now().timestamp(), datetime.now().timestamp()+24*60*60, size)
dummy_sec = np.random.rand(size)
timestamp_series = pd.Series(timestamp_a_day + dummy_sec)\
.sort_values().reset_index(drop=True)\
.apply(lambda x: datetime.fromtimestamp(x))
data = pd.DataFrame(timestamp_series, columns=['timestamp'])
data['label'] = np.random.randint(0, 2, size)
Let's solve this problem !!! (I hope I understand your question precisely hahaha)
1) data['start_interval'] = data['timestamp'].dt.floor('s')
2) data.groupby('start_interval')['label'].mean()
Upvotes: 1