Reputation: 95
Hey sorry for being a rookie with python, just need a point in the right direction.
Here is the data i am working with:
2018-12-14 14:36:00
2018-12-15 02:00:00
2018-12-15 04:48:00
2018-12-16 06:12:00
2018-12-16 11:28:00
2018-12-16 23:52:00
What i need to do is only keep the last entry for each day. Is it wise take the values which are stored in a numpy array and append them to a list. Will this be my best option? I ask because I know I cannot delete values from a numpy array. Just curious if this will be easier to do this from a list or numpy array?
Upvotes: 4
Views: 557
Reputation: 26315
If your data is not sorted, you could also store your dates in collections.defaultdict()
, then take the maximum datetime
:
from collections import defaultdict
from datetime import datetime
data = [
'2018-12-14 14:36:00',
'2018-12-15 02:00:00',
'2018-12-15 04:48:00',
'2018-12-16 06:12:00',
'2018-12-16 11:28:00',
'2018-12-16 23:52:00'
]
d = defaultdict(list)
for item in data:
date, _ = item.split()
d[date].append(item)
print([max(v, key=lambda x: datetime.strptime(x, "%Y-%m-%d %H:%M:%S")) for _, v in d.items()])
# ['2018-12-14 14:36:00', '2018-12-15 04:48:00', '2018-12-16 23:52:00']
The above uses datetime.datetime.strptime()
to get the maximum datetime, as specified in the key
parameter.
If your data is already sorted, you can take advantage of itertools.groupby()
:
from itertools import groupby
data = [
'2018-12-14 14:36:00',
'2018-12-15 02:00:00',
'2018-12-15 04:48:00',
'2018-12-16 06:12:00',
'2018-12-16 11:28:00',
'2018-12-16 23:52:00'
]
print([list(g)[-1] for _, g in groupby(data, key=lambda x: x.split()[0])])
# ['2018-12-14 14:36:00', '2018-12-15 04:48:00', '2018-12-16 23:52:00']
Upvotes: 0
Reputation: 362717
I noticed that your times are ordered. If that is the case, you may simply convert to dict to keep the last entry:
>>> times
['2018-12-14 14:36:00',
'2018-12-15 02:00:00',
'2018-12-15 04:48:00',
'2018-12-16 06:12:00',
'2018-12-16 11:28:00',
'2018-12-16 23:52:00']
>>> print(*dict(s.split() for s in times).items(), sep='\n')
('2018-12-14', '14:36:00')
('2018-12-15', '04:48:00')
('2018-12-16', '23:52:00')
If the ordering is not reliable, do not sort - this will degrade time complexity down to O(n log n) for what is an O(n) task. Instead, you should build a dict (keys are dates, and values are lists of times) and take maxima for the dict values.
Upvotes: 4
Reputation: 8942
If that is an option, you can use some of the facilities in pandas dataframes.
The following makes two assumptions:
You have added an extra column on which to group, which will discard the time from the time stamp so that you can group by day.
my_dataframe.groupby(['my_date_only_column']).my_timestamp_column.last()
Upvotes: 0