Reputation: 5126
I have a string list of timestamp (date_millisecondtime.csv) based filenames like these:
[..., file_20181105_110001.csv, file_20181105_120002.csv, file_20181105_130002.csv, file_20181105_140002.csv, file_20181105_150003.csv, file_20181105_160002.csv, file_20181105_170002.csv, file_20181105_200002.csv,
file_20181105_210002.csv, file_20181106_010002.csv, file_20181106_020002.csv, file_20181106_030002.csv...]
So here files with date 2018-11-05 (Nov 5, 2018) with timestamp 11, 12, 13, 14, 15, 16, 17, 20 and 21.
I want to print only filenames 18 and 19 as they are missing. And the valid time range is from 1 - 23 so if hour in filenames are not present in this range for a given day (here its 2018-11-05), print those missing hours files.
Upvotes: 2
Views: 779
Reputation: 772
Another solution in case you need to check also files missing at the beginning/end of the list (e.g: hour 0-10, 22 and 23)
filenames = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
pos = 0
for h in range(0, 23):
n = "file_20181105_" + str(h).zfill(2)
if pos < len(filenames) and n == filenames[pos][: len(n)]:
print("Found", h)
pos += 1
else: print("Not found", h)
Of course, you can build the n
with the day you want to go through in multiple different ways. If needed, you can create another loop to go through days.
Edit:
If we want to check for more than one day, we can loop through the days checking its files/hours.
IMHO, i would suggest a lot of changes in the following code depending on the use case, number of days, number of file names, preference and code style, etc.
filenames = ['file_20181104_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
pos = 0
missing = []
for d in (4, 5):
for h in range(0, 23):
n = "file_201811" + str(d).zfill(2) + "_" + str(h).zfill(2)
if pos < len(filenames) and n == filenames[pos][: len(n)]:
pos += 1
print("Found", d, h)
else:
print("Not Found", d, h)
Upvotes: 0
Reputation: 164703
One solution is to use a set comprehension to extract the times present. If I understand your requirement, you can then calculate the min
and max
times and take the difference from a set
derived from a range
:
L = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_130002.csv',
'file_20181105_140002.csv', 'file_20181105_150003.csv', 'file_20181105_160002.csv',
'file_20181105_170002.csv', 'file_20181105_200002.csv', 'file_20181105_210002.csv']
present = {int(i.rsplit('_', 1)[-1][:2]) for i in L}
min_time, max_time = min(present), max(present)
res = set(range(min_time, max_time)) - present # {18, 19}
You can then build your filenames from the missing times. I'll leave this as an exercise [hint: list comprehension].
Upvotes: 2