jothebo
jothebo

Reputation: 41

Python load large number of files

I'm trying to load a large number of files saved in the Ensight gold format into a numpy array. In order to conduct this read I've written my own class libvec which reads the geometry file and then preallocates the arrays which python will use to save the data as shown in the code below.

N = len(file_list)
# Create the class object and read geometry file
gvec = vec.libvec(os.path.join(current_dir,casefile))
x,y,z = gvec.xyz()

# Preallocate arrays
U_temp = np.zeros((len(y),len(x),N),dtype=np.dtype('f4'))
V_temp = np.zeros((len(y),len(x),N),dtype=np.dtype('f4'))
u_temp = np.zeros((len(x),len(x),N),dtype=np.dtype('f4'))
v_temp = np.zeros((len(x),len(y),N),dtype=np.dtype('f4'))

# Read the individual files into the previously allocated arrays
for idx,current_file in enumerate(file_list):
    U,V =gvec.readvec(os.path.join(current_dir,current_file))
    U_temp[:,:,idx] = U
    V_temp[:,:,idx] = V

    del U,V

However this takes seemingly forever so I was wondering if you have any idea how to speed up this process? The code reading the individual files into the array structure can be seen below:

def readvec(self,filename):
# we are supposing for the moment that the naming scheme PIV__vxy.case PIV__vxy.geo not changes should that
# not be the case appropriate changes have to be made to the corresponding file
    data_temp = np.loadtxt(filename, dtype=np.dtype('f4'), delimiter=None, converters=None, skiprows=4)

    # U value
    for i in range(len(self.__y)):
        # x value counter
        for j in range(len(self.__x)):
            # y value counter
            self.__U[i,j]=data_temp[i*len(self.__x)+j]

    # V value
    for i in range(len(self.__y)):
        # x value counter
        for j in range(len(self.__x)):
            # y value counter
            self.__V[i,j]=data_temp[len(self.__x)*len(self.__y)+i*len(self.__x)+j]

    # W value
    if len(self.__z)>1:

        for i in range(len(self.__y)):
            # x value counter
            for j in range(len(self.__xd)):
                # y value counter
                self.__W[i,j]=data_temp[2*len(self.__x)*len(self.__y)+i*len(self.__x)+j]

        return self.__U,self.__V,self.__W
    else:    
        return self.__U,self.__V

Thanks a lot in advance and best regards,

J

Upvotes: 0

Views: 128

Answers (1)

M4rtini
M4rtini

Reputation: 13539

It'a bit hard to say without any test input\output to compare against. But i think this would give you the same U\V arrays as your nested for loops in readvec. This method should be considerably faster then the for loops.

U = data[:size_x*size_y].reshape(size_x, size_y)
V = data[size_x*size_y:].reshape(size_x, size_y)

Returning these directly into U_temp and V_temp should also help. Right now you're doing 3(?) copies of your data to get them into U_temp and V_temp

  1. From file to temp_data
  2. From temp_data to self.__U\V
  3. From U\V into U\V_temp

Although my guess is that the two nested for loop, and accessing one element at a time is causing the slowness

Upvotes: 1

Related Questions