Reputation: 792
I introduce my own data type and I want to furnish it with save/load functions which would operate on text files, but I fail to provide a proper fmt string to numpy.savetxt(). The problem arises due to the fact that one of the fields of my dtype is a tuple (two floats in the naive example below), which I think effectively results in an attempt of saving a 3D object with savetxt().
It can be made work only when saving a number of floats as "%s" (but then I can not loadtxt() them, variant 1 in the code) or when introducing an inefficient my_repr() function (variant 2) below.
I can not believe that numpy does not provide an efficient formatter/save/load api to custom types. Anyone with an idea of solving it nicely?
import numpy as np
def main():
my_type = np.dtype([('single_int', np.int),
('two_floats', np.float64, (2,))])
my_var = np.array( [(1, (2., 3.)),
(4, (5., 6.))
],
dtype=my_type)
# Verification
print(my_var)
print(my_var['two_floats'])
# Let's try to save and load it in three variants
variant = 2
if variant == 0:
# the line below would not work: "ValueError: fmt has wrong number of % formats: %d %f %f"
np.savetxt('f.txt', my_var, fmt='%d %f %f')
# so I don't even try to load
elif variant == 1:
# The line below does work, but saves floats between '[]' which makes them not loadable later
np.savetxt('f.txt', my_var, fmt='%d %s')
# lines such as "1 [2. 3.]" won't load, the line below raises an Exception
my_var_loaded = np.loadtxt('f.txt', dtype=my_type)
elif variant == 2:
# An ugly workaround:
def my_repr(o):
return [(elem['single_int'], *elem['two_floats']) for elem in o]
# and then the rest works fine:
np.savetxt('f.txt', my_repr(my_var), fmt='%d %f %f')
my_var_loaded = np.loadtxt('f.txt', dtype=my_type)
print('my_var_loaded')
print(my_var_loaded)
if __name__ == '__main__':
main()
Upvotes: 0
Views: 1017
Reputation: 231530
In [115]: my_type = np.dtype([('single_int', np.int),
...: ('two_floats', np.float64, (2,))])
In [116]: my_var = np.array( [(1, (2., 3.)),
...: (4, (5., 6.))
...: ],
...: dtype=my_type)
In [117]: my_var
Out[117]:
array([(1, [2., 3.]), (4, [5., 6.])],
dtype=[('single_int', '<i8'), ('two_floats', '<f8', (2,))])
Jumping straight to the loading step:
In [118]: txt = """1 2. 3.
...: 4 5. 6."""
In [119]: np.genfromtxt(txt.splitlines(), dtype=my_type)
Out[119]:
array([(1, [2., 3.]), (4, [5., 6.])],
dtype=[('single_int', '<i8'), ('two_floats', '<f8', (2,))])
As I commented savetxt
is simply doing:
for row in my_var:
f.write(fmt % tuple(row))
So we have to, in one way or other, work around or with the basic Python %
formatting. Either that, or write our own text file. There's nothing magical about savetxt
. It's plain python.
===
Recent numpy versions include a function to 'flatten' a structured array:
In [120]: import numpy.lib.recfunctions as rf
In [121]: arr = rf.structured_to_unstructured(my_var)
In [122]: arr
Out[122]:
array([[1., 2., 3.],
[4., 5., 6.]])
In [123]: np.savetxt('test.csv', arr, fmt='%d %f %f')
In [124]: cat test.csv
1 2.000000 3.000000
4 5.000000 6.000000
In [125]: np.genfromtxt('test.csv', dtype=my_type)
Out[125]:
array([(1, [2., 3.]), (4, [5., 6.])],
dtype=[('single_int', '<i8'), ('two_floats', '<f8', (2,))])
Saving an object dtype array gets around a lot of the formatting issues:
In [182]: my_var
Out[182]:
array([(1, [2., 3.]), (4, [5., 6.])],
dtype=[('single_int', '<i8'), ('two_floats', '<f8', (2,))])
In [183]: def my_repr(o):
...: return [(elem['single_int'], *elem['two_floats']) for elem in o]
...:
In [184]: my_repr(my_var)
Out[184]: [(1, 2.0, 3.0), (4, 5.0, 6.0)]
In [185]: np.array(_,object)
Out[185]:
array([[1, 2.0, 3.0],
[4, 5.0, 6.0]], dtype=object)
In [186]: np.savetxt('f.txt', _, fmt='%d %f %f')
In [187]: cat f.txt
1 2.000000 3.000000
4 5.000000 6.000000
Upvotes: 1