user3346361
user3346361

Reputation: 35

select files from path

I have files in particular path and need to select one by one base on namefile (yyyymmdd.faifb1p16m2.nc) where yyyy is year, mm is month, and dd is date. I made code like this :

results=[]
base_dir = 'C:/DATA2013'
os.chdir(base_dir) 
files = os.listdir('C:/DATA2013')
for f in files:
    results += [each for each in os.listdir('C:/DATA2013')
    if each.endswith('.faifb1p16m2.nc')] 

What should I do next if I only select files for January, and then February, and so on. Thank you.

Upvotes: 2

Views: 201

Answers (4)

nakkulable
nakkulable

Reputation: 38

try this:

from os import *
results = []
base_dir = 'C://local'
chdir(base_dir)
files = listdir(base_dir)
for f in files:
    if '.faifb1p16m2.nc' in f and f[4:6] == '01': #describe the month in this string
        print f            

Upvotes: 0

alvonellos
alvonellos

Reputation: 1062

Two regexes:

  1. \d{4}(?:\d?|\d{2})(?:\d?|\d{2})\.faifb1p16m2\.nc
  2. \d{8}\.faifb1p16m2\.nc

Sample data:

  1. 20140131.faifb1p16m2.nc
  2. 2014131.faifb1p16m2.nc
  3. 201412.faifb1p16m2.nc
  4. 201411.faifb1p16m2.nc
  5. 20141212.faifb1p16m2.nc
  6. 2014121.faifb1p16m2.nc
  7. 201411.faifb1p16m2.nc

The first regex will match all 7 of those entries. The second regex will match only 1, and 5. I probably made the regexes way more complicated than I needed to.

You're going to want the second regex, but I'm just listing the first as an example.

from glob import glob
import re

re1 = r'\d{4}(?:\d?|\d{2})(?:\d?|\d{2})\.faifb1p16m2\.nc'
re2 = r'\d{8}\.faifb1p16m2\.nc'

l = [f for f in glob('*.faifb1p16m2.nc') if re.search(re1, f)]
m = [f for f in glob('*.faifb1p16m2.nc') if re.search(re2, f)]

print l
print
print m
#Then, suppose you want to filter and select everything with '12' in the list m
print filter(lambda x: x[4:6] == '12', m)

As another similar solution shows you can ditch glob for os.listdir(), so:

l = [f for f in glob('*.faifb1p16m2.nc') if re.search(re1, f)]`

Becomes:

l = [f for f in os.listdir() if re.search(re1, f)]

And then the rest of the code is great. One of the great things about using glob is that you can use iglob which is just like glob, but as an iterator, which can help with performance when going through a directory with lots of files.

One more thing, here's another stackoverflow post with an overview of python's infamous lambda feature. It's often used for the functions map, reduce, filter, and so on.

Upvotes: 0

jfs
jfs

Reputation: 414235

To validate filenames, you could use datetime.strptime() method:

#!/usr/bin/env python
import os
from datetime import datetime
from glob import glob

suffix = '.faifb1p16m2.nc'

def parse_date(path):
    try:
        return datetime.strptime(os.path.basename(path), '%Y%m%d' + suffix)
    except ValueError:
        return None # failed to parse


paths_by_month = [[] for _ in range(12 + 1)]
for path in glob(r'C:\DATA2013\*' + suffix): # for each nc-file in the directory
    date = parse_date(path)
    paths_by_month[date and date.month or 0].append(path)

print(paths_by_month[2]) # February paths
print(paths_by_month[0]) # paths with unrecognized date

Upvotes: 0

Omsai Jadhav
Omsai Jadhav

Reputation: 134

You can do :

x = [i for i in results if i[4:6] == '01']

It will list all file names for January. Assuming that your all files of same format as you have described in the question.

Upvotes: 1

Related Questions