Reputation: 631

select certain files from directory

In the same directory I have several files, some of them are sample measurements and others are references. They look like this:

blablabla_350.dat
blablabla_351.dat
blablabla_352.dat
blablabla_353.dat
...
blablabla_100.dat
blablabla_101.dat
blablabla_102.dat

The ones ending from 350 to 353 are my samples, the ones ending at 100, 101 and 102 are the references. The good thing is that samples and references are consecutives in numbers.

I would like to separate them in two different lists, samples and references.

One idea should be something like (not working yet):

import glob

samples = []
references = []

ref = raw_input("Enter first reference name: ")
num_refs = raw_input("How many references are? ")

ref = sorted(glob.glob(ref+num_refs))

samples = sorted(glob.glob(*.dat)) not in references

So the reference list will take the first name specified and the subsequents (given by the number specified). All the rest will be samples. Any ideas how to put this in python?

Upvotes: 0

Answers (4)

Chris Mueller

Reputation: 6680

You can also do it without glob by using the os package:

import os, re

files = os.listdir(r'C:\path\to\files')
samples, references = [], []
for file in files:
    if re.search(r'blablabla_1\d{2}', file):
        references.append(file)
    elif re.serach(r'blablabla_3\d{2}', file):
        samples.append(file)
    else:
        print('{0} is neither sample nor reference'.format(file))

Upvotes: 0

Kreet.

Reputation: 151

try something like

import glob

samples = []
references = []

ref = raw_input("Enter first reference name: ")
num_refs = int(raw_input("How many references are? "))

for number in num_refs:
    refferences.append(ref+number)

for filename in sorted(glob.glob('*.dat')):
    if filename not in refferences:
        samples.append(filename)

Upvotes: -1

Robᵩ

Reputation: 168626

You can use glob.glob('*.dat') to get a list of all of the files and then slice that list according to your criteria. The slice will begin at the index of the first reference name, and be as large as the number of references.

Extract that slice to get your references. Delete that slice to get your samples.

import glob

samples = []
references = []

ref = raw_input("Enter first reference name: ")        # blablabla_100.dat
num_refs = int(raw_input("How many references are? ")) # 3

all_files = sorted(glob.glob('*.dat'))
first_ref = all_files.index(ref)
ref_files = all_files[first_ref:first_ref+num_refs]

sample_files = all_files
del sample_files[first_ref:first_ref+num_refs]
del all_files

print ref_files, sample_files

Result:

['blablabla_100.dat', 'blablabla_101.dat', 'blablabla_102.dat'] ['blablabla_350.dat', 'blablabla_351.dat', 'blablabla_352.dat', 'blablabla_353.dat']

Upvotes: 2

nwk

Reputation: 4050

You can use glob.glob to get the list of all *.dat files then filter that list using a list comprehension with a conditional. In my solution I use a regular expression to extract the number from the filename as text. I then convert it to an integer and check if that integer lies between ref_from and ref_to. This works even if some of the reference files numbered between ref_from and ref_to are missing.

The list of samples is obtained through a set operation: it is the result of removing the set of references from the set of data_files. We can do this since all every filename can be assumed to be unique.

import glob
import re

samples = []
references = []

ref_from = 350
ref_to = 353

def ref_filter(filename):
    return ref_from <= int(re.search('_([0-9]+).dat', filename).group(1)) <= ref_to

data_files = sorted(glob.glob("*.dat"))
references = [filename for filename in data_files if ref_filter(filename)]
samples = list(set(data_files) - set(references))

print references
print samples

Alternatively, if you know all samples between ref_from and ref_to are going to be present, you can get rid of the function ref_filter and replace

references = [filename for filename in data_files if ref_filter(filename)]

with

references = ['blablabla_' + str(n) + '.dat' for n in xrange(ref_from, ref_to + 1)]

Upvotes: 2

select certain files from directory

Answers (4)

Related Questions