user1850133
user1850133

Reputation: 2993

numpy pass partly defined filename to genfromtxt

I have data files which I know the beginning and the end of their name. the names are structured as : ###_random_string.EXT where ### is a number from 000 to 999 and EXT the files extension (here csv). The number of files could be big, this is the reason why I'm using a python code to process them (smoothing, filtering, plotting, ...). The code in which these files will be processed will use numpy.genfromtxt to load the data of each file. I will put numpy.genfromtxt in a loop passing through a list of numbers (FilesNum) corresponding to the files to be processed. I would like to construct the filename with only ### given by 'FilesNum' and the extension. Here is a start :

import numpy as np
import glob
import re

FilesNum = range(0, 350, 2)
EXT = 'csv'
X, Y = [], []
for num in FilesNum:
    data = np.genfromtxt(glob.glob(str(num) + '*' + EXT), delimiter = ';')
    X.append(data[:, 0])
    Y.append(data[:, 1])

My problem here is that glob.glob(FilesNum + '*' + EXT) does not what I need as it should generate a list. In my specific case for each number corresponds only one file. Taking into account this point I need a code that will replace '*' by the exact missing part of the file name.

If the file starting with 0 is '000_random_string.csv' :

np.genfromtxt(glob.glob('000_' + '*' + '.csv'), delimiter = ',')
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-24-fefca52f40e1> in <module>()
----> 1 np.genfromtxt(glob.glob('%03d' % 0 + '*' + '.csv'), delimiter = ',')

/usr/lib64/python2.7/site-packages/numpy/lib/npyio.pyc in genfromtxt(fname, dtype, comments, delimiter, skiprows, skip_header, skip_footer, converters, missing, missing_values, filling_values, usecols, names, excludelist, deletechars, replace_space, autostrip, case_sensitive, defaultfmt, unpack, usemask, loose, invalid_raise)
   1294     # Skip the first `skip_header` rows
   1295     for i in xrange(skip_header):
-> 1296         fhd.next()
   1297 
   1298     # Keep on until we find the first valid values

StopIteration: 

While :

np.genfromtxt('000_random_string.csv', delimiter = ',')
Out[30]: 
array([[  350.   ,  -210.   ],
       [  351.4  ,  -210.   ],
       [  352.8  ,   -42.608],
       ..., 
       [ 1747.2  ,   -62.798],
       [ 1748.6  ,  -210.   ],
       [ 1750.   ,  -210.   ]])

note that :

glob.glob('%03d' % 0 + '*' + '.csv')
Out[31]: ['000_random_string.csv']

Upvotes: 0

Views: 1353

Answers (3)

Magellan88
Magellan88

Reputation: 2573

Ok, I think we are close to a solution here:

I think if you use the tip with the %-formating that Martin supplied you are there (however if I undersood you correctly you still need the glob if the random_strings are different). so your files are 0000_randomstring1.csv, 0000_randomstring2.csv ... 0000_randomstringN.csv, 0002_randomstring1.csv, 0002_randomstring2.csv ... 0000_randomstringN.csv ... 0350_randomstringN.csv, right? and you want the ones with the same number to be read as if they were one file, right?

then this hould work:

import numpy as np
import glob
import re

FilesNum = range(0, 350, 2)
EXT = 'csv'
X, Y = [], []
for num in FilesNum:
    data = np.genfromtxt(glob.glob( "%04d*%s"%( num, EXT ) ), delimiter = ';').T
    X.append( data[0] )
    Y.append( data[1] )

if you want just all files to be read into one large file you could just as well do

AllFilesAsList = glob.glob( "0*.csv" )
X,Y = np.genfromtxt(AllFilesAsList, delimiter = ';').T

Oh, now I get it:

so what I'd recommend you doing is then

Files = [ glob.glob( "%04d*%s"%( num, EXT ) for num in range(0,350,2) )]

now you can actually look at all the files it finds and your loop becomes more readable

for f in Files:
    data = np.genfromtxt( f, delimiter = ';').T
    X.append( data[0] )
    Y.append( data[1] )

Upvotes: 0

user1850133
user1850133

Reputation: 2993

ok, I found my solution. I only needed to give an index to glob.glob(). As I said for each number exists only one file, so giving index 0 will always do what I want here.

np.genfromtxt(glob.glob('000_' + '*' + '.csv')[0], delimiter = ',')
                                               ^
                                               |-- the solution
Out[5]: 
array([[  350.   ,  -210.   ],
       [  351.4  ,  -210.   ],
       [  352.8  ,   -42.608],
       ..., 
       [ 1747.2  ,   -62.798],
       [ 1748.6  ,  -210.   ],
       [ 1750.   ,  -210.   ]])

Upvotes: 1

Martin
Martin

Reputation: 1070

You are missing path to the directory of the files.

Furthermore, it is unnecessary to call glob.glob on the filename.

f_name = '.'.join([str(num), ext])

should do it. It converts the file_number into string and concatenates it with the file extension (the separator is the dot).

Then the complete path to file is:

import os
f_path = os.path.join(dir_path, f_name)

Edit: Thanks for the comment. I should have googled first, now I get the question (probably). Will leave it as it is.

Upvotes: 0

Related Questions