Reputation: 631
In the same directory I have several files, some of them are sample measurements and others are references. They look like this:
blablabla_350.dat
blablabla_351.dat
blablabla_352.dat
blablabla_353.dat
...
blablabla_100.dat
blablabla_101.dat
blablabla_102.dat
The ones ending from 350 to 353 are my samples, the ones ending at 100, 101 and 102 are the references. The good thing is that samples and references are consecutives in numbers.
I would like to separate them in two different lists, samples and references.
One idea should be something like (not working yet):
import glob
samples = []
references = []
ref = raw_input("Enter first reference name: ")
num_refs = raw_input("How many references are? ")
ref = sorted(glob.glob(ref+num_refs))
samples = sorted(glob.glob(*.dat)) not in references
So the reference list will take the first name specified and the subsequents (given by the number specified). All the rest will be samples. Any ideas how to put this in python?
Upvotes: 0
Views: 1301
Reputation: 6680
You can also do it without glob
by using the os
package:
import os, re
files = os.listdir(r'C:\path\to\files')
samples, references = [], []
for file in files:
if re.search(r'blablabla_1\d{2}', file):
references.append(file)
elif re.serach(r'blablabla_3\d{2}', file):
samples.append(file)
else:
print('{0} is neither sample nor reference'.format(file))
Upvotes: 0
Reputation: 151
try something like
import glob
samples = []
references = []
ref = raw_input("Enter first reference name: ")
num_refs = int(raw_input("How many references are? "))
for number in num_refs:
refferences.append(ref+number)
for filename in sorted(glob.glob('*.dat')):
if filename not in refferences:
samples.append(filename)
Upvotes: -1
Reputation: 168626
You can use glob.glob('*.dat')
to get a list of all of the files and then slice that list according to your criteria. The slice will begin at the index of the first reference name, and be as large as the number of references.
Extract that slice to get your references. Delete that slice to get your samples.
import glob
samples = []
references = []
ref = raw_input("Enter first reference name: ") # blablabla_100.dat
num_refs = int(raw_input("How many references are? ")) # 3
all_files = sorted(glob.glob('*.dat'))
first_ref = all_files.index(ref)
ref_files = all_files[first_ref:first_ref+num_refs]
sample_files = all_files
del sample_files[first_ref:first_ref+num_refs]
del all_files
print ref_files, sample_files
Result:
['blablabla_100.dat', 'blablabla_101.dat', 'blablabla_102.dat'] ['blablabla_350.dat', 'blablabla_351.dat', 'blablabla_352.dat', 'blablabla_353.dat']
Upvotes: 2
Reputation: 4050
You can use glob.glob
to get the list of all *.dat
files then filter that list using a list comprehension with a conditional. In my solution I use a regular expression to extract the number from the filename as text. I then convert it to an integer and check if that integer lies between ref_from
and ref_to
. This works even if some of the reference files numbered between ref_from
and ref_to
are missing.
The list of samples is obtained through a set operation: it is the result of removing the set of references
from the set of data_files
. We can do this since all every filename can be assumed to be unique.
import glob
import re
samples = []
references = []
ref_from = 350
ref_to = 353
def ref_filter(filename):
return ref_from <= int(re.search('_([0-9]+).dat', filename).group(1)) <= ref_to
data_files = sorted(glob.glob("*.dat"))
references = [filename for filename in data_files if ref_filter(filename)]
samples = list(set(data_files) - set(references))
print references
print samples
Alternatively, if you know all samples between ref_from
and ref_to
are going to be present, you can get rid of the function ref_filter
and replace
references = [filename for filename in data_files if ref_filter(filename)]
with
references = ['blablabla_' + str(n) + '.dat' for n in xrange(ref_from, ref_to + 1)]
Upvotes: 2