Reputation: 283
I'm new to python and want the most pythonic way of solving the following basic problem:
I have many plain-text data files file.00001
, file.00002
, ..., file.99999
and each file has a single line, with numeric data stored in e.g. four columns. I want to read each file sequentially and append the data into one array per column, so in the end I want the arrays arr0
, arr1
, arr2
, arr3
each with shape=(99999,)
containing all the data from the appropriate column in all the files.
Later on I want to do lots of math with these arrays so I need to make sure that their entries are contiguous in memory. My naive solution is:
import numpy as np
fnumber = 99999
fnums = np.arange(1, fnumber+1)
arr0 = np.full_like(fnums, np.nan, dtype=np.double)
arr1 = np.full_like(fnums, np.nan, dtype=np.double)
arr2 = np.full_like(fnums, np.nan, dtype=np.double)
arr3 = np.full_like(fnums, np.nan, dtype=np.double)
# ...also is there a neat way of doing this??
for fnum in fnums:
fname = f'path/to/data/folder/file.{fnum:05}'
arr0[fnum-1], arr1[fnum-1], arr2[fnum-1], arr3[fnum-1] = np.loadtxt(fname, delimiter=' ', unpack=True)
# error checking - in case a file got deleted or something
all_arrs = (arr0, arr1, arr2, arr3)
if np.isnan(all_arrs).any():
print("CUIDADO HAY NANS!!!!\nLOOK OUT, THERE ARE NANS!!!!")
It strikes me that this is very C-thinking and there probably is a more pythonic way of doing it. But my feeling is that methods like numpy.concatenate
and numpy.insert
would either not result in arrays with their contents contiguous in memory, or involve deep copies of each array at every step in the for loop, which would probably melt my laptop.
Is there a more pythonic way?
Upvotes: 1
Views: 982
Reputation: 231375
Try:
alist = []
for fnum in fnums:
fname = f'path/to/data/folder/file.{fnum:05}'
alist.append(np.loadtxt(fname))
arr = np.array(alist)
# arr = np.vstack(alist) # alternative
print(arr.shape)
Assuming the files all have the same number of columns, one of these should work. The result will be one array, which you could separate into 4 if needed.
Upvotes: 2