nass9801
nass9801

Reputation: 339

How to append many numpy files into one numpy file in python

I am trying to put many numpy files to get one big numpy file, I tried to follow those two links Append multiple numpy files to one big numpy file in python and Python append multiple files in given order to one big file this is what I did:

import matplotlib.pyplot as plt 
import numpy as np
import glob
import os, sys
fpath ="/home/user/Desktop/OutFileTraces.npy"
npyfilespath ="/home/user/Desktop/test"   
os.chdir(npyfilespath)
with open(fpath,'wb') as f_handle:
    for npfile in glob.glob("*.npy"):
        # Find the path of the file
        filepath = os.path.join(npyfilespath, npfile)
        print filepath
        # Load file
        dataArray= np.load(filepath)
        print dataArray
        np.save(f_handle,dataArray)
        dataArray= np.load(fpath)
        print dataArray

An example of the result that I have:

/home/user/Desktop/Trace=96
[[ 0.01518007  0.01499514  0.01479736 ..., -0.00392216 -0.0039761
  -0.00402747]]
[[-0.00824758 -0.0081808  -0.00811402 ..., -0.0077236  -0.00765425
  -0.00762086]]
/home/user/Desktop/Trace=97
[[ 0.00614908  0.00581004  0.00549154 ..., -0.00814741 -0.00813457
  -0.00809347]]
[[-0.00824758 -0.0081808  -0.00811402 ..., -0.0077236  -0.00765425
  -0.00762086]]
/home/user/Desktop/Trace=98
[[-0.00291786 -0.00309509 -0.00329287 ..., -0.00809861 -0.00797789
  -0.00784175]]
[[-0.00824758 -0.0081808  -0.00811402 ..., -0.0077236  -0.00765425
  -0.00762086]]
/home/user/Desktop/Trace=99
[[-0.00379887 -0.00410453 -0.00438963 ..., -0.03497837 -0.0353842
  -0.03575151]]
[[-0.00824758 -0.0081808  -0.00811402 ..., -0.0077236  -0.00765425
  -0.00762086]

this line represents the first trace:

[[-0.00824758 -0.0081808  -0.00811402 ..., -0.0077236  -0.00765425
      -0.00762086]]

It is repeated all the time.

I asked the second question two days ago, at first I think that I had the best answer, but after trying to model to print and lot the final file 'OutFileTraces.npy' I found that my code:

1/ doesn't print numpy files from folder 'test' with respecting their order(trace0,trace1, trace2,...)

2/ saves only the last trace in the file, I mean by that when print or plot the OutFileTraces.npy, I found just one trace , it is the first one.

So I need to correct my code because really I am blocked. I would be very grateful if you could help me.

Thanks in advance.

Upvotes: 5

Views: 15898

Answers (2)

hpaulj
hpaulj

Reputation: 231375

As discussed in

loading arrays saved using numpy.save in append mode

it is possible to save multiple times to an open file, and it possible to load multiple times. That's not documented, and probably not preferred, but it works. savez archive is the preferred method for saving multiple arrays.

Here's a toy example:

In [777]: with open('multisave.npy','wb') as f:
     ...:     arr = np.arange(10)
     ...:     np.save(f, arr)
     ...:     arr = np.arange(20)
     ...:     np.save(f, arr)
     ...:     arr = np.ones((3,4))
     ...:     np.save(f, arr)
     ...:     
In [778]: ll multisave.npy
-rw-rw-r-- 1 paul 456 Feb 13 08:38 multisave.npy
In [779]: with open('multisave.npy','rb') as f:
     ...:     arr = np.load(f)
     ...:     print(arr)
     ...:     print(np.load(f))
     ...:     print(np.load(f))
     ...:     
[0 1 2 3 4 5 6 7 8 9]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
[[ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]]

Here's a simple example of saving a list of arrays of the same shape

In [780]: traces = [np.arange(10),np.arange(10,20),np.arange(100,110)]
In [781]: traces
Out[781]: 
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
 array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109])]
In [782]: arr = np.array(traces)
In [783]: arr
Out[783]: 
array([[  0,   1,   2,   3,   4,   5,   6,   7,   8,   9],
       [ 10,  11,  12,  13,  14,  15,  16,  17,  18,  19],
       [100, 101, 102, 103, 104, 105, 106, 107, 108, 109]])

In [785]: np.save('mult1.npy', arr)

In [786]: data = np.load('mult1.npy')
In [787]: data
Out[787]: 
array([[  0,   1,   2,   3,   4,   5,   6,   7,   8,   9],
       [ 10,  11,  12,  13,  14,  15,  16,  17,  18,  19],
       [100, 101, 102, 103, 104, 105, 106, 107, 108, 109]])
In [788]: list(data)
Out[788]: 
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
 array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109])]

Upvotes: 2

Pierre de Buyl
Pierre de Buyl

Reputation: 7293

  1. Glob produces unordered lists. You need to sort explicitly with an extra line as the sorting procedure is in-place and does not return the list.

    npfiles = glob.glob("*.npy")
    npfiles.sort()
    for npfile in npfiles:
        ...
    
  2. NumPy files contain a single array. If you want to store several arrays in a single file you may have a look at .npz files with np.savez https://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html#numpy.savez I have not seen this in use widely, so you may wish seriously to consider alternatives.

    1. If your arrays are all of the same shape and store related data, you can make a larger array. Say that the current shape is (N_1, N_2) and that you have N_0 such arrays. A loop with

      all_arrays = []
      for npfile in npfiles:
          all_arrays.append(np.load(os.path.join(npyfilespath, npfile)))
      all_arrays = np.array(all_arrays)
      np.save(f_handle, all_array)
      

      will produce a file with a single array of shape (N_0, N_1, N_2)

    2. If you need per-name access to the arrays, HDF5 files are a good match. See http://www.h5py.org/ (a full intro is too much for a SO reply, see the quick start guide http://docs.h5py.org/en/latest/quick.html)

Upvotes: 4

Related Questions