Reputation: 1160

Grouping timestamp strings by day in python

I have a number of files, in date order with the format YYMMDD_hhmmss.txt I want to isolate the files based on their days only.

There would be 24 files per day, 1 for each hour... I want to isolate all the files for each day into separate lists.

day = 1
list_for_a_day = []

for filename in all_files:
     if '%s' % (day) in filename:
          list_for_a_day.append(filename)
          day += 1
          if day > 31:
             pass

This is clearly wrong way of going about this.. If I have 3 days worth of files, each day having 24 files, so thats 72 files... I'd want 3 lists, each containing the relevant files for each day.

Upvotes: 1

Answers (4)

mhawke

Reputation: 87114

I'd go for a defaultdict of lists. The keys for the dict would be the date. The values would be a list of the file names for that date.

from glob import glob
from datetime import datetime
from collections import defaultdict

files_for_date = defaultdict(list)

for filename in glob('*.txt'):
    try:
        date = datetime.strptime(filename, '%Y%m%d_%H%M%S.txt').date()
        files_for_date[date].append(filename)
    except ValueError as exc:
        print('Skipping file {}'.format(filename))

Following this files_for_date will have grouped the file names for the same day (date) into lists keyed by a datetime.date object.

If you prefer, you can convert the date object to a string using str(date) or with strftime(), e.g.

files_for_date[date.strftime('%Y%m%d')].append(filename)

would result in string keys of format YYYYMMDD.

Upvotes: 0

Gsk

Reputation: 2945

If you don't want to manually handle exception (like bissextile years, days in month) you can parse through datetime:

import datetime

fmt = "%02d%02d%02d"
starting_date = datetime.datetime(year=2016, month=1, day=1)
for _ in range(365):
   starting_date += datetime.timedelta(days=1)
   list_for_a_day = [filename for filename in all_files if filename[-16:-11] in fmt % (abs(starting_date.year)%100, starting_date.month, starting_date.day)]

In this way you parse through each day from 1/1/2016 to 1/1/2017 (365 days) and get a list of each file that has that date in given position. Hope it's helpful

Upvotes: 0

atomAltera

Reputation: 1781

I think, you should use dict of int->list of stings for this

def sep_file_by_days(filename_list):
filenames_by_day = dict()

for filename in filename_list:
    day = int(filename[4:6])

    if not day in filenames_by_day:
        filenames_by_day[day] = []

    filenames_by_day[day].append(filename)

return filenames_by_day

Upvotes: 0

cs95

Reputation: 402813

How about using a dictionary? Here's a high level outline of how I'm doing it.

Iterate over all your filenames
For each filename, extract the day attribute (I'm just using string splitting, which should work assuming your file name structures are consistent)
Add that file to a list indexed by day in a dictionary.

files = {}
for filename in all_files:
    day = filename.split('_')[0][-2:]   
    files.setdefault(day, []).append(filename)

files would look something like this:

{ 
    day1 : [f11, f12, ...],
    day2 : [f21, f22, ...], 
    ...
}

Note that the keys are strings, but they could just as easily be integers, provided you cast day to int in advance.

Upvotes: 3

Grouping timestamp strings by day in python

Answers (4)

Related Questions