Evix
Evix

Reputation: 53

Remove certain dates in list. Python 3.4

I have a list that has several days in it. Each day have several timestamps. What I want to do is to make a new list that only takes the start time and the end time in the list for each date. I also want to delete the Character between the date and the time on each one, the char is always the same type of letter. the time stamps can vary in how many they are on each date.

Since I'm new to python it would be preferred to use a lot of simple to understand codes. I've been using a lot of regex so pleas if there is a way with this one.

the list has been sorted with the command list.sort() so it's in the correct order.

code used to extract the information was the following.

file1 = open("test.txt", "r")
for f in file1:
    list1 += re.findall('20\d\d-\d\d-\d\dA\d\d\:\d\d', f)
listX = (len(list1))
list2 = list1[0:listX - 2]
list2.sort()

here is a list of how it looks:

2015-12-28A09:30
2015-12-28A09:30
2015-12-28A09:35
2015-12-28A09:35
2015-12-28A12:00
2015-12-28A12:00
2015-12-28A12:15
2015-12-28A12:15
2015-12-28A14:30
2015-12-28A14:30
2015-12-28A15:15
2015-12-28A15:15
2015-12-28A16:45
2015-12-28A16:45
2015-12-28A17:00
2015-12-28A17:00
2015-12-28A18:15
2015-12-28A18:15
2015-12-29A08:30
2015-12-29A08:30
2015-12-29A08:35
2015-12-29A08:35
2015-12-29A10:45
2015-12-29A10:45
2015-12-29A11:00
2015-12-29A11:00
2015-12-29A13:15
2015-12-29A13:15
2015-12-29A14:00
2015-12-29A14:00
2015-12-29A15:30
2015-12-29A15:30
2015-12-29A15:45
2015-12-29A15:45
2015-12-29A17:15
2015-12-29A17:15
2015-12-30A08:30
2015-12-30A08:30
2015-12-30A08:35
2015-12-30A08:35
2015-12-30A10:45
2015-12-30A10:45
2015-12-30A11:00
2015-12-30A11:00
2015-12-30A13:00
2015-12-30A13:00
2015-12-30A13:45
2015-12-30A13:45
2015-12-30A15:15
2015-12-30A15:15
2015-12-30A15:30
2015-12-30A15:30
2015-12-30A17:15
2015-12-30A17:15

And this is how I want it to look like:

2015-12-28 09:30
2015-12-28 18:15
2015-12-29 08:30
2015-12-29 17:15
2015-12-30 08:30
2015-12-30 17:15

Upvotes: 2

Views: 759

Answers (2)

Padraic Cunningham
Padraic Cunningham

Reputation: 180550

Because your data is ordered you just need to pull the first and last value from each group, you can use re.sub to remove the single letter replacing it with a space then split each date string just comparing the dates:

from re import sub

def grp(l):
    it = iter(l)
    prev = start = next(it).replace("A"," ")
    for dte in it:
        dte = dte.replace("A"," ")
        # if we have a new date, yield that start and end 
        if dte.split(None, 1)[0] != prev.split(None,1)[0]:
            yield start
            yield prev
            start = dte
        prev = dte
    yield start, prev
l=["2015-12-28A09:30", "2015-12-28A09:30", .....................
l[:] = grp(l)

This could also certainly be done as your process the file without sorting by using a dict to group:

from re import findall

from collections import OrderedDict

with open("dates.txt") as f:
    od = defaultdict(lambda: {"min": "null", "max": ""})
    for line in f:
        for dte in findall('20\d\d-\d\d-\d\dA\d\d\:\d\d', line):
            dte, tme = dte.split("A")
            _dte = "{} {}".format(dte, tme)
            if od[dte]["min"] > _dte:
                od[dte]["min"] = _dte
            if od[dte]["max"] < _dte:
                od[dte]["max"] = _dt

    print(list(od.values()))

Which will give you the start and end time for each date.

[{'min': '2016-01-03 23:59', 'max': '2016-01-03 23:59'}, 
{'min': '2015-12-28 00:00', 'max': '2015-12-28 18:15'}, 
{'min': '2015-12-30 08:30', 'max': '2015-12-30 17:15'}, 
{'min': '2015-12-29 08:30', 'max': '2015-12-29 17:15'}, 
{'min': '2015-12-15 08:41', 'max': '2015-12-15 08:41'}]

The start for 2015-12-28 is also 00:00 not 9:30.

if you dates are actually as posted one per line you don't need a regex either:

from collections import defaultdict

with open("dates.txt") as f:
    od = defaultdict(lambda: {"min": "null", "max": ""})
    for line in f:
            dte, tme = line.rstrip().split("A")
            _dte = "{} {}".format(dte, tme)
            if od[dte]["min"] > _dte:
                od[dte]["min"] = _dte
            if od[dte]["max"] < _dte:
                od[dte]["max"] = _dte

print(list(od.values()

Which would give you the same output.

Upvotes: 0

poke
poke

Reputation: 388463

First of all, you should convert all your strings into proper dates, Python can work with. That way, you have a lot more control on it, also to change the formatting later. So let’s parse your dates using datetime.strptime in list2:

from datetime import datetime
dates = [datetime.strptime(item, '%Y-%m-%dA%H:%M') for item in list2]

This creates a new list dates that contains all your dates from list2 but as parsed datetime object.

Now, since you want to get the first and the last date of each day, we somehow have to group your dates by the date component. There are various ways to do that. I’ll be using itertools.groupby for it, with a key function that just looks at the date component of each entry:

from itertools import groupby
for day, times in groupby(dates, lambda x: x.date()):
    first, *mid, last = times
    print(first)
    print(last)

If we run this, we already get your output (without date formatting):

2015-12-28 09:30:00
2015-12-28 18:15:00
2015-12-29 08:30:00
2015-12-29 17:15:00
2015-12-30 08:30:00
2015-12-30 17:15:00

Of course, you can also collect that first and last date in a list first to process the dates later:

filteredDates = []
for day, times in groupby(dates, lambda x: x.date()):
    first, *mid, last = times
    filteredDates.append(first)
    filteredDates.append(last)

And you can also output your dates with a different format using datetime.strftime:

for date in filteredDates:
    print(date.strftime('%Y-%m-%d %H:%M'))

That would give us the following output:

2015-12-28 09:30
2015-12-28 18:15
2015-12-29 08:30
2015-12-29 17:15
2015-12-30 08:30
2015-12-30 17:15

If you don’t want to go the route through parsing those dates, of course you could also do this simply by working on the strings. Since they are nicely formatted (i.e. they can be easily compared), you can do that as well. It would look like this then:

for day, times in groupby(list2, lambda x: x[:10]):
    first, *mid, last = times
    print(first)
    print(last)

Producing the following output:

2015-12-28A09:30
2015-12-28A18:15
2015-12-29A08:30
2015-12-29A17:15
2015-12-30A08:30
2015-12-30A17:15

Upvotes: 1

Related Questions