Reputation: 2993
I have data files which I know the beginning and the end of their name. the names are structured as : ###_random_string.EXT where ### is a number from 000 to 999 and EXT the files extension (here csv). The number of files could be big, this is the reason why I'm using a python code to process them (smoothing, filtering, plotting, ...). The code in which these files will be processed will use numpy.genfromtxt to load the data of each file. I will put numpy.genfromtxt in a loop passing through a list of numbers (FilesNum) corresponding to the files to be processed. I would like to construct the filename with only ### given by 'FilesNum' and the extension. Here is a start :
import numpy as np
import glob
import re
FilesNum = range(0, 350, 2)
EXT = 'csv'
X, Y = [], []
for num in FilesNum:
data = np.genfromtxt(glob.glob(str(num) + '*' + EXT), delimiter = ';')
X.append(data[:, 0])
Y.append(data[:, 1])
My problem here is that glob.glob(FilesNum + '*' + EXT) does not what I need as it should generate a list. In my specific case for each number corresponds only one file. Taking into account this point I need a code that will replace '*' by the exact missing part of the file name.
If the file starting with 0 is '000_random_string.csv' :
np.genfromtxt(glob.glob('000_' + '*' + '.csv'), delimiter = ',')
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-24-fefca52f40e1> in <module>()
----> 1 np.genfromtxt(glob.glob('%03d' % 0 + '*' + '.csv'), delimiter = ',')
/usr/lib64/python2.7/site-packages/numpy/lib/npyio.pyc in genfromtxt(fname, dtype, comments, delimiter, skiprows, skip_header, skip_footer, converters, missing, missing_values, filling_values, usecols, names, excludelist, deletechars, replace_space, autostrip, case_sensitive, defaultfmt, unpack, usemask, loose, invalid_raise)
1294 # Skip the first `skip_header` rows
1295 for i in xrange(skip_header):
-> 1296 fhd.next()
1297
1298 # Keep on until we find the first valid values
StopIteration:
While :
np.genfromtxt('000_random_string.csv', delimiter = ',')
Out[30]:
array([[ 350. , -210. ],
[ 351.4 , -210. ],
[ 352.8 , -42.608],
...,
[ 1747.2 , -62.798],
[ 1748.6 , -210. ],
[ 1750. , -210. ]])
note that :
glob.glob('%03d' % 0 + '*' + '.csv')
Out[31]: ['000_random_string.csv']
Upvotes: 0
Views: 1353
Reputation: 2573
Ok, I think we are close to a solution here:
I think if you use the tip with the %-formating that Martin supplied you are there (however if I undersood you correctly you still need the glob if the random_strings are different). so your files are 0000_randomstring1.csv, 0000_randomstring2.csv ... 0000_randomstringN.csv, 0002_randomstring1.csv, 0002_randomstring2.csv ... 0000_randomstringN.csv ... 0350_randomstringN.csv, right? and you want the ones with the same number to be read as if they were one file, right?
then this hould work:
import numpy as np
import glob
import re
FilesNum = range(0, 350, 2)
EXT = 'csv'
X, Y = [], []
for num in FilesNum:
data = np.genfromtxt(glob.glob( "%04d*%s"%( num, EXT ) ), delimiter = ';').T
X.append( data[0] )
Y.append( data[1] )
if you want just all files to be read into one large file you could just as well do
AllFilesAsList = glob.glob( "0*.csv" )
X,Y = np.genfromtxt(AllFilesAsList, delimiter = ';').T
Oh, now I get it:
so what I'd recommend you doing is then
Files = [ glob.glob( "%04d*%s"%( num, EXT ) for num in range(0,350,2) )]
now you can actually look at all the files it finds and your loop becomes more readable
for f in Files:
data = np.genfromtxt( f, delimiter = ';').T
X.append( data[0] )
Y.append( data[1] )
Upvotes: 0
Reputation: 2993
ok, I found my solution. I only needed to give an index to glob.glob(). As I said for each number exists only one file, so giving index 0 will always do what I want here.
np.genfromtxt(glob.glob('000_' + '*' + '.csv')[0], delimiter = ',')
^
|-- the solution
Out[5]:
array([[ 350. , -210. ],
[ 351.4 , -210. ],
[ 352.8 , -42.608],
...,
[ 1747.2 , -62.798],
[ 1748.6 , -210. ],
[ 1750. , -210. ]])
Upvotes: 1
Reputation: 1070
You are missing path to the directory of the files.
Furthermore, it is unnecessary to call glob.glob on the filename.
f_name = '.'.join([str(num), ext])
should do it. It converts the file_number into string and concatenates it with the file extension (the separator is the dot).
Then the complete path to file is:
import os
f_path = os.path.join(dir_path, f_name)
Edit: Thanks for the comment. I should have googled first, now I get the question (probably). Will leave it as it is.
Upvotes: 0