Reputation: 36146
What would be the best way to write multiple numpy arrays of different dtype as different columns of a single CSV file?
For instance, given the following arrays:
array([[1, 2],
[3, 4],
[5, 6]])
array([[ 10., 20.],
[ 30., 40.],
[ 50., 60.]])
I would like to obtain a file (delimiter irrelevant):
1 2 10.0 20.0
3 4 30.0 40.0
5 6 50.0 60.0
Optimally, I would like to be able to write a list of arrays this way, where a format/dtype can be different for every array.
I tried looking at savetxt
, but it's not clear to me how to use it if the arrays have a different type.
Upvotes: 2
Views: 2035
Reputation: 231738
In [38]: a=np.arange(1,7).reshape(3,2)
In [39]: b=np.arange(10,70.,10).reshape(3,2)
In [40]: c=np.concatenate((a,b),axis=1)
In [41]: c
Out[41]:
array([[ 1., 2., 10., 20.],
[ 3., 4., 30., 40.],
[ 5., 6., 50., 60.]])
All values are float; default savetxt
is a general float:
In [43]: np.savetxt('test.csv',c)
In [44]: cat test.csv
1.000000000000000000e+00 2.000000000000000000e+00 1.000000000000000000e+01 2.000000000000000000e+01
3.000000000000000000e+00 4.000000000000000000e+00 3.000000000000000000e+01 4.000000000000000000e+01
5.000000000000000000e+00 6.000000000000000000e+00 5.000000000000000000e+01 6.000000000000000000e+01
With a custom fmt
I can get:
In [46]: np.savetxt('test.csv',c,fmt='%2d %2d %5.1f %5.1f')
In [47]: cat test.csv
1 2 10.0 20.0
3 4 30.0 40.0
5 6 50.0 60.0
More generally we can make a c
with a compound dtype. It isn't needed here with just floats and ints, but with strings it would matter. But we still need a long fmt
to display the columns correctly.
np.rec.fromarrays
is an easy way to generate a structured arrays. Unfortunately it only works with flattened arrays. So for your (3,2) arrays I need to list the columns separately.
In [52]: c = np.rec.fromarrays((a[:,0],a[:,1],b[:,0],b[:,1]))
In [53]: c
Out[53]:
rec.array([(1, 2, 10.0, 20.0), (3, 4, 30.0, 40.0), (5, 6, 50.0, 60.0)],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<f8'), ('f3', '<f8')])
In [54]: np.savetxt('test.csv',c,fmt='%2d %2d %5.1f %5.1f')
In [55]: cat test.csv
1 2 10.0 20.0
3 4 30.0 40.0
5 6 50.0 60.0
I'm using the same savetxt
.
I could also make a structured array with 2 fields, each being 2 columns. I'm not sure if savetxt
would work with that or not.
savetxt
essentially iterates over the 1st dimension of your array, and does a formatted write on each row, roughly:
for row in arr:
f.write(fmt%tuple(row))
where fmt
is derived from your parameter.
It wouldn't be hard to write your own version that iterates on 2 arrays, and does a separate formatted write for each pair of rows.
for r1,r2 in zip(a,b):
print('%2d %2d'%tuple(r1), '%5.1f %5.1f'%tuple(r2))
===================
Trying a compound dtype
In [60]: np.dtype('2i,2f')
Out[60]: dtype([('f0', '<i4', (2,)), ('f1', '<f4', (2,))])
In [61]: c=np.zeros(a.shape[0], np.dtype('2i,2f'))
In [62]: c['f0']=a
In [63]: c['f1']=b
In [64]: c
Out[64]:
array([([1, 2], [10.0, 20.0]), ([3, 4], [30.0, 40.0]),
([5, 6], [50.0, 60.0])],
dtype=[('f0', '<i4', (2,)), ('f1', '<f4', (2,))])
In [65]: np.savetxt('test.csv',c,fmt='%2d %2d %5.1f %5.1f')
---
ValueError: fmt has wrong number of % formats: %2d %2d %5.1f %5.1f
So writing a compound dtype like this does not work. Considering that a row of c
looks like:
In [69]: tuple(c[0])
Out[69]: (array([1, 2], dtype=int32), array([ 10., 20.], dtype=float32))
I shouldn't be surprised.
I can save the two blocks with %s
format, but that leaves me with brackets.
In [66]: np.savetxt('test.csv',c,fmt='%s %s')
In [67]: cat test.csv
[1 2] [ 10. 20.]
[3 4] [ 30. 40.]
[5 6] [ 50. 60.]
I think there is a np.rec
function that flattens the dtype. But I can also do that with a view
:
In [72]: np.savetxt('test.csv',c.view('i,i,f,f'),fmt='%2d %2d %5.1f %5.1f')
In [73]: cat test.csv
1 2 10.0 20.0
3 4 30.0 40.0
5 6 50.0 60.0
So as long as you are dealing with numeric values, the simple concatenate is just as good as the more complex structured approaches.
============
Upvotes: 1
Reputation: 107347
use np.concatenate
in order to concatenate the arrays along the second axis then use np.savetxt
inorder to save your array in a a text format.
import numpy as np
a = np.array([[1, 2],
[3, 4],
[5, 6]])
b = np. array([[10., 20.],
[30., 40.],
[50., 60.]])
np.savetxt('filename.csv', np.concatenate((a,b), axis=1))
Note that np.savetxt
also accepts another arguments like delimiter
.
numpy.savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\n', header='', footer='', comments='# ')
Upvotes: 1