Reputation: 35
I have files in particular path and need to select one by one base on namefile (yyyymmdd.faifb1p16m2.nc) where yyyy is year, mm is month, and dd is date. I made code like this :
results=[]
base_dir = 'C:/DATA2013'
os.chdir(base_dir)
files = os.listdir('C:/DATA2013')
for f in files:
results += [each for each in os.listdir('C:/DATA2013')
if each.endswith('.faifb1p16m2.nc')]
What should I do next if I only select files for January, and then February, and so on. Thank you.
Upvotes: 2
Views: 201
Reputation: 38
try this:
from os import *
results = []
base_dir = 'C://local'
chdir(base_dir)
files = listdir(base_dir)
for f in files:
if '.faifb1p16m2.nc' in f and f[4:6] == '01': #describe the month in this string
print f
Upvotes: 0
Reputation: 1062
Two regexes:
\d{4}(?:\d?|\d{2})(?:\d?|\d{2})\.faifb1p16m2\.nc
\d{8}\.faifb1p16m2\.nc
Sample data:
The first regex will match all 7 of those entries. The second regex will match only 1, and 5. I probably made the regexes way more complicated than I needed to.
You're going to want the second regex, but I'm just listing the first as an example.
from glob import glob
import re
re1 = r'\d{4}(?:\d?|\d{2})(?:\d?|\d{2})\.faifb1p16m2\.nc'
re2 = r'\d{8}\.faifb1p16m2\.nc'
l = [f for f in glob('*.faifb1p16m2.nc') if re.search(re1, f)]
m = [f for f in glob('*.faifb1p16m2.nc') if re.search(re2, f)]
print l
print
print m
#Then, suppose you want to filter and select everything with '12' in the list m
print filter(lambda x: x[4:6] == '12', m)
As another similar solution shows you can ditch glob for os.listdir(), so:
l = [f for f in glob('*.faifb1p16m2.nc') if re.search(re1, f)]`
Becomes:
l = [f for f in os.listdir() if re.search(re1, f)]
And then the rest of the code is great. One of the great things about using glob is that you can use iglob
which is just like glob, but as an iterator, which can help with performance when going through a directory with lots of files.
One more thing, here's another stackoverflow post with an overview of python's infamous lambda feature. It's often used for the functions map
, reduce
, filter
, and so on.
Upvotes: 0
Reputation: 414235
To validate filenames, you could use datetime.strptime()
method:
#!/usr/bin/env python
import os
from datetime import datetime
from glob import glob
suffix = '.faifb1p16m2.nc'
def parse_date(path):
try:
return datetime.strptime(os.path.basename(path), '%Y%m%d' + suffix)
except ValueError:
return None # failed to parse
paths_by_month = [[] for _ in range(12 + 1)]
for path in glob(r'C:\DATA2013\*' + suffix): # for each nc-file in the directory
date = parse_date(path)
paths_by_month[date and date.month or 0].append(path)
print(paths_by_month[2]) # February paths
print(paths_by_month[0]) # paths with unrecognized date
Upvotes: 0
Reputation: 134
You can do :
x = [i for i in results if i[4:6] == '01']
It will list all file names for January. Assuming that your all files of same format as you have described in the question.
Upvotes: 1