Reputation: 305
I am new to Python and am trying to convert a 2d numpy array, like:
a=numpy.array([[191.25,0,0,1],[191.251,0,0,1],[191.252,0,0,1]])
to a string in which the column entries are separated by one delimiter '\t' and the the rows are separated by another delimiter '\n' with control over the precision of each column, to get:
b='191.250\t0.00\t0\t1\n191.251\t0.00\t0\t1\n191.252\t0.00\t0\t1\n'
First, I create the array by:
import numpy as np
col1=np.arange(191.25,196.275,.001)[:, np.newaxis]
nrows=col1.shape[0]
col2=np.zeros((nrows,1),dtype=np.int)
col3=np.zeros((nrows,1),dtype=np.int)
col4=np.ones((nrows,1),dtype=np.int)
a=np.hstack((col1,col2,col3,col4))
Then I produce b, by one of 2 methods:
Method 1:
b=''
for i in range(0,a.shape[0]):
for j in range(0,a.shape[1]-1):
b+=str(a[i,j])+'\t'
b+=str(a[i,-1])+'\n'
b
Method 2:
b=''
for i in range(0,a.shape[0]):
b+='\t'.join(['%0.3f' %x for x in a[i,:]])+'\n'
b
However, I'm guessing there are better ways of producing a and b. I am looking for the most efficient ways (i.e. memory, time, code compactness) to create a and b.
Follow up questions
Thank you Mike,
b = '\n'.join('\t'.join('%0.3f' %x for x in y) for y in a)+'\n'
worked for me but I have a few follow up questions (this couldn't fit in the comment section):
Method 1
y=b.split('\n')[:-1]
z=[y[i].split('\t') for i in range(0,len(y))]
a=numpy.array(z,dtype=float)
Method 2
import re
a=numpy.array(filter(None,re.split('[\n\t]+',b)),dtype=float).reshape(-1,4)
Is there a better way?
Upvotes: 5
Views: 15741
Reputation: 3
With Python3 I made it with one line:
str(a).replace('[','').replace(']','').replace('\n',' ')+' '
Output (fixed width):
'191.25 0. 0. 1. 191.251 0. 0. 1. 191.252 0. 0. 1. '
Upvotes: 0
Reputation: 85522
A one-liner will do:
b = '\n'.join('\t'.join('%0.3f' %x for x in y) for y in a)
Using a simpler example:
>>> a = np.arange(25, dtype=float).reshape(5, 5)
>>> a
array([[ 0., 1., 2., 3., 4.],
[ 5., 6., 7., 8., 9.],
[ 10., 11., 12., 13., 14.],
[ 15., 16., 17., 18., 19.],
[ 20., 21., 22., 23., 24.]])
This:
b = '\n'.join('\t'.join('%0.3f' %x for x in y) for y in a)
print(b)
prints this:
0.000 1.000 2.000 3.000 4.000
5.000 6.000 7.000 8.000 9.000
10.000 11.000 12.000 13.000 14.000
15.000 16.000 17.000 18.000 19.000
20.000 21.000 22.000 23.000 24.000
You already used a list comprehension in your second method. Here we have a generator expression, which looks exactly like a list comprehension. The only syntactical difference is that the []
are replaced by ()
. A generator expression does not build the list but hands a so called generator to join
. In the end it has the same effect but skips the step of building this intermediate list.
There can be multiple for
in such an expression, which makes it nested.
This:
b = '\n'.join('\t'.join('%0.3f' %x for x in y) for y in a)
is equivalent to:
res = []
for y in a:
res.append('\t'.join('%0.3f' %x for x in y))
b = '\n'.join(res)
I use %%timeit
in the IPython Notebook:
%%timeit
b = '\n'.join('\t'.join('%0.3f' %x for x in y) for y in a)
10 loops, best of 3: 42.4 ms per loop
%%timeit
b=''
for i in range(0,a.shape[0]):
for j in range(0,a.shape[1]-1):
b+=str(a[i,j])+'\t'
b+=str(a[i,-1])+'\n'
10 loops, best of 3: 50.2 ms per loop
%%timeit
b=''
for i in range(0,a.shape[0]):
b+='\t'.join(['%0.3f' %x for x in a[i,:]])+'\n'
10 loops, best of 3: 43.8 ms per loop
Looks like they are all about the same speed. Actually, the +=
is optimized in CPython. Otherwise, it would be much slower, than the join()
approach. Other Python implementations such as Jython or PyPy can show much bigger time differences and can make the join()
much faster compared to +=
.
Upvotes: 6