Reputation: 18084
numpy.genfromtxt(infile, dtype=None)
does a pretty good job of determining the number formats in each column of my input files. How can we use those same already determined types when saving the data file with numpy.savetxt()
? Savetxt uses a very different format syntax.
indata = '''
1000 254092.500 1630087.500 9144.00 9358.96 214.96
422 258667.500 1633267.500 6096.00 6490.28 394.28
15 318337.500 1594192.500 9144.00 10524.28 1380.28
-15 317392.500 1597987.500 6096.00 4081.26 -2014.74
-14 253627.500 1601047.500 21336.00 20127.51 -1208.49
END
'''
code
import numpy as np
header = 'Scaled_Residual,X,Y,Local_Std_Error,Vertical_Std_Error,Unscaled_Residual'
data = np.genfromtxt(indata, names=header, dtype=None,
comments='E') #skip 'END' lines
print data.dtype
emits:
[('Scaled_Residual', '<i4'), ('X', '<f8'), ('Y', '<f8'), ('Local_Std_Error', '<f8'), ('Vertical_Std_Error', '<f8'), ('Unscaled_Residual', '<f8')]
so how to elegantly reconstruct data.dtype
so that it fits savetxt(... fmt='%i, %f, ...'
syntax without manually stepping through it? Is there an savefromgentxt() corollary I haven't discovered?
A simplistic, hopeful attempt at fmt=data.dtype
fails completely. ;-)
np.savetxt('test.csv', data, header=header, delimiter=',',
fmt=data.dtype)
Result:
...snip...\numpy\lib\npyio.py", line 1047, in savetxt
fh.write(asbytes(format % tuple(row) + newline))
UnboundLocalError: local variable 'format' referenced before assignment
Upvotes: 1
Views: 1048
Reputation: 231325
fmt
is supposed to a format string, or list of strings. See the examples in savetxt
documentation. It is not a dtype
.
np.savetxt('test.csv',data, fmt='%10s')
gets 90% of the way there:
1000 254092.5 1630087.5 9144.0 9358.96 214.96
422 258667.5 1633267.5 6096.0 6490.28 394.28
15 318337.5 1594192.5 9144.0 10524.28 1380.28
-15 317392.5 1597987.5 6096.0 4081.26 -2014.74
-14 253627.5 1601047.5 21336.0 20127.51 -1208.49
You would get closer by specifying a fmt string with number of decimals etc for each column.
np.savetxt('test.csv',data, fmt='%10d %10.3f %10.3f %10.2f %10.2f %10.2f')
does better. You can tweak the fmt
further.
The Python code for savetxt
is not that complex. I'd suggest looking at it.
The problem with generating anything fancier from the dtype
is that there isn't much more information.
In [154]: [x[1] for x in data.dtype.descr]
Out[154]: ['<i4', '<f8', '<f8', '<f8', '<f8', '<f8']
Compare these formats:
In [158]: '%i %f %f %f %f %f'%tuple(data[0])
Out[158]: '1000 254092.500000 1630087.500000 9144.000000 9358.960000 214.960000'
In [159]: '%s %s %s %s %s %s'%tuple(data[0])
Out[159]: '1000 254092.5 1630087.5 9144.0 9358.96 214.96'
In [160]: ' '.join(['%10s']*6)%tuple(data[0])
Out[160]: ' 1000 254092.5 1630087.5 9144.0 9358.96 214.96'
A simple translation of the dtype
info:
def foo(astr):
if 'i' in astr:
return '%10i'
elif 'f' in astr:
return '%10f'
[foo(x[1]) for x in data.dtype.descr]
# ['%10i', '%10f', '%10f', '%10f', '%10f', '%10f']
You could also use the dtype
names to generate a header line.
Upvotes: 1