jms547
jms547

Reputation: 283

python loadtxt from many files, appending into the same numpy arrays

I'm new to python and want the most pythonic way of solving the following basic problem:

I have many plain-text data files file.00001, file.00002, ..., file.99999 and each file has a single line, with numeric data stored in e.g. four columns. I want to read each file sequentially and append the data into one array per column, so in the end I want the arrays arr0, arr1, arr2, arr3 each with shape=(99999,) containing all the data from the appropriate column in all the files.

Later on I want to do lots of math with these arrays so I need to make sure that their entries are contiguous in memory. My naive solution is:

import numpy as np
fnumber = 99999
fnums = np.arange(1, fnumber+1)

arr0 = np.full_like(fnums, np.nan, dtype=np.double)
arr1 = np.full_like(fnums, np.nan, dtype=np.double)
arr2 = np.full_like(fnums, np.nan, dtype=np.double)
arr3 = np.full_like(fnums, np.nan, dtype=np.double)
# ...also is there a neat way of doing this??

for fnum in fnums:
    fname = f'path/to/data/folder/file.{fnum:05}'
    arr0[fnum-1], arr1[fnum-1], arr2[fnum-1], arr3[fnum-1] = np.loadtxt(fname, delimiter=' ', unpack=True)

# error checking - in case a file got deleted or something
all_arrs = (arr0, arr1, arr2, arr3)
if np.isnan(all_arrs).any():
    print("CUIDADO HAY NANS!!!!\nLOOK OUT, THERE ARE NANS!!!!")

It strikes me that this is very C-thinking and there probably is a more pythonic way of doing it. But my feeling is that methods like numpy.concatenate and numpy.insert would either not result in arrays with their contents contiguous in memory, or involve deep copies of each array at every step in the for loop, which would probably melt my laptop.

Is there a more pythonic way?

Upvotes: 1

Views: 982

Answers (1)

hpaulj
hpaulj

Reputation: 231375

Try:

alist = []
for fnum in fnums:
    fname = f'path/to/data/folder/file.{fnum:05}'
    alist.append(np.loadtxt(fname))
arr = np.array(alist)
# arr = np.vstack(alist)    # alternative
print(arr.shape)

Assuming the files all have the same number of columns, one of these should work. The result will be one array, which you could separate into 4 if needed.

Upvotes: 2

Related Questions