user2323596
user2323596

Reputation: 533

Storing and loading numpy arrays as files

In my program I'm working with various numpy arrays of varying sizes. I need to store them into XML files for later use. I did not write them to binary files so I have all my data in one place (the XML file) and not scattered through 200 files.

So I tried to use numpy's array_str() method to transform an array into a String. The resulting XML looks like this:

-<Test date="2013-07-10-17:19">
    <Neurons>5</Neurons>
    <Errors>[7.7642140551985428e-06, 7.7639131137987232e-06]</Errors>
    <Iterations>5000</Iterations>
    <Weights1>[[ 0.99845902 -0.70780512 0.26981375 -0.6077122 0.09639695] [ 0.61856711 -0.74684913 0.20099992 0.99725171 -0.41826754] [ 0.79964397 0.56620812 -0.64055346 -0.50572793 -0.50100635]]</Weights1>
    <Weights2>[[-0.1851452 -0.22036027] [ 0.19293429 -0.1374252 ] [-0.27638478 -0.38660974] [ 0.30441414 -0.01531598] [-0.02478953 0.01823584]]</Weights2>
</Test>

The Weights are the values I want to store. Now the problem is that numpy's fromstring() method can't reload these apparently... I get "ValueError: string size must be a multiple of element size"

I wrote them with "np.array_str(w1)" and try to read them with "np.fromstring(w_str1)". Apparently the result is only a 1D array even if it works, so I have to restore the shape manually. Ugh, that is a pain already since I'll also have to store it somehow too.

What is the best way to do this properly? Preferably one that also saves my array's shape and datatype without manual housekeeping for every little thing.

Upvotes: 8

Views: 7756

Answers (4)

Muhammad Umar Farooq
Muhammad Umar Farooq

Reputation: 521

You can use numpy.ndarray.tostring() to convert the array into string (bytes array actually). Numpy.ndarray.tostring()

Then this can be later used to read back the array using numpy.fromstring().

In [138]: x = np.arange(12).reshape(3,4)

In [139]: x.tostring()
Out[139]: '\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00\x05\x00\x00\x00\x06\x00\x00\x00\x07\x00\x00\x00\x08\x00\x00\x00\t\x00\x00\x00\n\x00\x00\x00\x0b\x00\x00\x00'

In [140]: np.fromstring(x.tostring(), dtype=x.dtype).reshape(x.shape)
Out[140]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

Upvotes: 0

heltonbiker
heltonbiker

Reputation: 27575

My suggestion if you really want to preserve the initial XML formatting you had, is to use json module to convert between ndarray and string.

Check the following:

import json, numpy

w1 = numpy.array([[ 0.99845902, -0.70780512, 0.26981375, -0.6077122, 0.09639695],
                  [ 0.61856711, -0.74684913, 0.20099992, 0.99725171, -0.41826754],
                  [ 0.79964397, 0.56620812, -0.64055346, -0.50572793, -0.50100635]])

print w1
print

#####

w1string = json.dumps(w1.tolist())

## NOW YOU COULD PASS "w1string" TO/FROM XML

#####


print w1string
print

w1back = numpy.array(json.loads(w1string))

print w1back
print

Upvotes: 1

Jaime
Jaime

Reputation: 67427

Unfortunately there is no easy way to read your current output back into numpy. The output won't look as nice on your xml file, but you could create a readable version of your arrays as follows:

>>> import cStringIO
>>> a = np.array([[ 0.99845902, -0.70780512, 0.26981375, -0.6077122, 0.09639695], [ 0.61856711, -0.74684913, 0.20099992, 0.99725171, -0.41826754], [ 0.79964397, 0.56620812, -0.64055346, -0.50572793, -0.50100635]])
>>> out_f = cStringIO.StringIO()
>>> np.savetxt(out_f, a, delimiter=',')
>>> out_f.getvalue()
'9.984590199999999749e-01,-7.078051199999999543e-01,2.698137500000000188e-01,-6.077122000000000357e-01,9.639694999999999514e-02\n6.185671099999999756e-01,-7.468491299999999722e-01,2.009999199999999986e-01,9.972517100000000134e-01,-4.182675399999999932e-01\n7.996439699999999817e-01,5.662081199999999814e-01,-6.405534600000000189e-01,-5.057279300000000477e-01,-5.010063500000000447e-01\n'

And load it back as:

>>> in_f = cStringIO.StringIO(out_f.getvalue())
>>> np.loadtxt(in_f, delimiter=',')
array([[ 0.99845902, -0.70780512,  0.26981375, -0.6077122 ,  0.09639695],
       [ 0.61856711, -0.74684913,  0.20099992,  0.99725171, -0.41826754],
       [ 0.79964397,  0.56620812, -0.64055346, -0.50572793, -0.50100635]])

Upvotes: 3

Saullo G. P. Castro
Saullo G. P. Castro

Reputation: 58885

Numpy provides an easy way to store many arrays in a compressed file:

a = np.arange(10)
b = np.arange(10)
np.savez_compressed('file.npz', a=a, b=b)

You can even change the array names when saving, by doing for example: np.savez_compressed('file.npz', newa=a, newb=b).

To read the saved file use np.load(), which returns a NpzFile instance that works like a dictionary:

loaded = np.load('file.npz')

To load the arrays:

a_loaded = loaded['a']
b_loaded = loaded['b']

or:

from operator import itemgetter
g = itemgetter( 'a', 'b' )
a_loaded, b_loaded = g(np.load('file.npz'))

Upvotes: 16

Related Questions