matt wilkie
matt wilkie

Reputation: 18084

Numpy: use dtype from genfromtxt() when exporting with savetxt()

numpy.genfromtxt(infile, dtype=None) does a pretty good job of determining the number formats in each column of my input files. How can we use those same already determined types when saving the data file with numpy.savetxt()? Savetxt uses a very different format syntax.

indata = '''
    1000  254092.500 1630087.500  9144.00  9358.96   214.96
     422  258667.500 1633267.500  6096.00  6490.28   394.28
      15  318337.500 1594192.500  9144.00 10524.28  1380.28
     -15  317392.500 1597987.500  6096.00  4081.26 -2014.74
     -14  253627.500 1601047.500 21336.00 20127.51 -1208.49
END
'''

code

import numpy as np
header = 'Scaled_Residual,X,Y,Local_Std_Error,Vertical_Std_Error,Unscaled_Residual'
data = np.genfromtxt(indata, names=header, dtype=None,
    comments='E') #skip 'END' lines

print data.dtype

emits:

[('Scaled_Residual', '<i4'), ('X', '<f8'), ('Y', '<f8'), ('Local_Std_Error', '<f8'), ('Vertical_Std_Error', '<f8'), ('Unscaled_Residual', '<f8')]

so how to elegantly reconstruct data.dtype so that it fits savetxt(... fmt='%i, %f, ...' syntax without manually stepping through it? Is there an savefromgentxt() corollary I haven't discovered?

A simplistic, hopeful attempt at fmt=data.dtype fails completely. ;-)

np.savetxt('test.csv', data, header=header, delimiter=',',
    fmt=data.dtype)

Result:

  ...snip...\numpy\lib\npyio.py", line 1047, in savetxt
    fh.write(asbytes(format % tuple(row) + newline))
UnboundLocalError: local variable 'format' referenced before assignment

Upvotes: 1

Views: 1048

Answers (1)

hpaulj
hpaulj

Reputation: 231325

fmt is supposed to a format string, or list of strings. See the examples in savetxt documentation. It is not a dtype.

np.savetxt('test.csv',data, fmt='%10s')

gets 90% of the way there:

  1000   254092.5  1630087.5     9144.0    9358.96     214.96
   422   258667.5  1633267.5     6096.0    6490.28     394.28
    15   318337.5  1594192.5     9144.0   10524.28    1380.28
   -15   317392.5  1597987.5     6096.0    4081.26   -2014.74
   -14   253627.5  1601047.5    21336.0   20127.51   -1208.49

You would get closer by specifying a fmt string with number of decimals etc for each column.

np.savetxt('test.csv',data, fmt='%10d  %10.3f %10.3f %10.2f %10.2f %10.2f')

does better. You can tweak the fmt further.

The Python code for savetxt is not that complex. I'd suggest looking at it.

The problem with generating anything fancier from the dtype is that there isn't much more information.

In [154]: [x[1] for x in data.dtype.descr]
Out[154]: ['<i4', '<f8', '<f8', '<f8', '<f8', '<f8']

Compare these formats:

In [158]: '%i %f %f %f %f %f'%tuple(data[0])
Out[158]: '1000 254092.500000 1630087.500000 9144.000000 9358.960000 214.960000'

In [159]: '%s %s %s %s %s %s'%tuple(data[0])
Out[159]: '1000 254092.5 1630087.5 9144.0 9358.96 214.96'

In [160]: ' '.join(['%10s']*6)%tuple(data[0])
Out[160]: '      1000   254092.5  1630087.5     9144.0    9358.96     214.96'

A simple translation of the dtype info:

def foo(astr):
    if 'i' in astr:
        return '%10i'
    elif 'f' in astr:
        return '%10f'
[foo(x[1]) for x in data.dtype.descr]
# ['%10i', '%10f', '%10f', '%10f', '%10f', '%10f']

You could also use the dtype names to generate a header line.

Upvotes: 1

Related Questions