Reputation: 267
I need to transform ranges to consecutive numbers. The ranges are in ints and the result should be the same. This is what I have so far:
import numpy as np
mydata = np.array (
[49123400, 49123499],
[33554333, 33554337])
numbers_list = np.empty((0))
base_dir = "/foo.csv"
for x in mydata:
numbers = np.arange(x[0], x[1]+1)
numbers_list = np.append(numbers_list, numbers, axis=0)
np.savetxt(base_dir, numbers_list, delimiter=";")
What I would like to see is a list like that:
49123400,
49123401,
49123402,...
49123499,
33554333,
33554334,...
33554399
But what I get is:
4.912340000000000000e+11 and so on...
Where am I going wrong? Why is there a change from int to float, when I am doing the append?
Upvotes: 1
Views: 11826
Reputation: 96
I had the same issue with appending columns to numpy array. i was using np.arange()
function to make a sample array with one column, then i was appending columns to it but the data was getting messy as you can see :
[[ 0.00000000e+00 -1.56000000e+00]
[ 1.00000000e+00 2.43000000e+00]
[ 2.00000000e+00 -9.40000000e-01]
...,
[ 4.99700000e+03 -1.99000000e+00]
[ 4.99800000e+03 4.10000000e-01]
[ 4.99900000e+03 -7.00000000e-02]]
the problem didn't go anyway even by ensuring the equality of dtypes but finally got solved by using np.zeros()
instead of np.arange()
.
Upvotes: 0
Reputation: 152587
One important lesson to learn is that you should always choose the right data structure for your problem. In most cases if you want to append/concatenate then numpy is the wrong choice, except you can trivially setup the final array (with its final shape) and alter it by setting slices of it.
In this case the obvious choice would be to use a normal python list
and range
:
mydata = [[49123400, 49123499],
[33554333, 33554337]]
mynewdata = []
for sublist in mydata:
mynewdata.extend(range(sublist[0], sublist[1]+1))
>>> mynewdata
[49123400, 49123401, 49123402, 49123403, 49123404, 49123405,
49123406, 49123407, 49123408, 49123409, 49123410, 49123411,
49123412, 49123413, 49123414, 49123415, 49123416, 49123417,
49123418, 49123419, 49123420, 49123421, 49123422, 49123423,
49123424, 49123425, 49123426, 49123427, 49123428, 49123429,
49123430, 49123431, 49123432, 49123433, 49123434, 49123435,
49123436, 49123437, 49123438, 49123439, 49123440, 49123441,
49123442, 49123443, 49123444, 49123445, 49123446, 49123447,
49123448, 49123449, 49123450, 49123451, 49123452, 49123453,
49123454, 49123455, 49123456, 49123457, 49123458, 49123459,
49123460, 49123461, 49123462, 49123463, 49123464, 49123465,
49123466, 49123467, 49123468, 49123469, 49123470, 49123471,
49123472, 49123473, 49123474, 49123475, 49123476, 49123477,
49123478, 49123479, 49123480, 49123481, 49123482, 49123483,
49123484, 49123485, 49123486, 49123487, 49123488, 49123489,
49123490, 49123491, 49123492, 49123493, 49123494, 49123495,
49123496, 49123497, 49123498, 49123499, 33554333, 33554334,
33554335, 33554336, 33554337]
This can be trivially converted to a numpy.array
:
>>> np.array(mynewdata)
array([49123400, 49123401, 49123402, 49123403, 49123404, 49123405,
49123406, 49123407, 49123408, 49123409, 49123410, 49123411,
49123412, 49123413, 49123414, 49123415, 49123416, 49123417,
49123418, 49123419, 49123420, 49123421, 49123422, 49123423,
49123424, 49123425, 49123426, 49123427, 49123428, 49123429,
49123430, 49123431, 49123432, 49123433, 49123434, 49123435,
49123436, 49123437, 49123438, 49123439, 49123440, 49123441,
49123442, 49123443, 49123444, 49123445, 49123446, 49123447,
49123448, 49123449, 49123450, 49123451, 49123452, 49123453,
49123454, 49123455, 49123456, 49123457, 49123458, 49123459,
49123460, 49123461, 49123462, 49123463, 49123464, 49123465,
49123466, 49123467, 49123468, 49123469, 49123470, 49123471,
49123472, 49123473, 49123474, 49123475, 49123476, 49123477,
49123478, 49123479, 49123480, 49123481, 49123482, 49123483,
49123484, 49123485, 49123486, 49123487, 49123488, 49123489,
49123490, 49123491, 49123492, 49123493, 49123494, 49123495,
49123496, 49123497, 49123498, 49123499, 33554333, 33554334,
33554335, 33554336, 33554337])
or even simply written to a file without bothering about arrays:
with open('yourfile', 'w') as file:
file.write(str(mynewdata).replace(',', ';'))
And finally a note on why you converted your integers to floats
:
>>> np.empty((0))
array([], dtype=float64)
The np.empty
creates a float array and so append/concatenate will always result in float
arrays. Use np.empty(0, int)
if you wanted an integer array:
>>> np.empty(0, int)
array([], dtype=int64)
Upvotes: 5
Reputation: 231335
It helps in cases like this to step through it in an iteractive session, and look at shape
and dtype
at each step.
In [254]: mydata = np.array( [
...: [49123400, 49123499],
...: [33554333, 33554337]])
In [255]: mydata
Out[255]:
array([[49123400, 49123499],
[33554333, 33554337]])
In [256]: mydata.shape
Out[256]: (2, 2)
In [257]: mydata.dtype
Out[257]: dtype('int32')
In [258]: numbers_list = np.empty((0))
In [259]: numbers_list
Out[259]: array([], dtype=float64)
Note that numbers_list
is a float array. Look into providing empty
with a dtype
In [260]: x=mydata[0]
In [261]: numbers = np.arange(x[0],x[1]+1)
In [262]: numbers.dtype
Out[262]: dtype('int32')
In [263]: numbers.shape
Out[263]: (100,)
In [264]: numbers_list = np.append(numbers_list, numbers, axis=0)
In [265]: numbers_list.shape
Out[265]: (100,)
In [266]: numbers_list.dtype
Out[266]: dtype('float64')
After concatenating these 2 arrays, the result has the dtype of the numbers_list
.
So changing that empty
dtype should preserve the int
dtype.
I have been on a crusade against np.append
. This is another example of its misuse. It is just a form of np.concatenate
, and often is a poor substitute for a list append
I'd suggest building a list and using one concatenate
In [267]: numbers_list = [np.arange(x[0],x[1]+1) for x in mydata]
In [268]: len(numbers_list)
Out[268]: 2
In [269]: np.concatenate(numbers_list)
Out[269]:
array([49123400, 49123401, 49123402, 49123403, 49123404, 49123405,
49123406, 49123407, 49123408, 49123409, 49123410, 49123411,
49123412, 49123413, 49123414, 49123415, 49123416, 49123417,
49123418, 49123419, 49123420, 49123421, 49123422, 49123423,
49123424, 49123425, 49123426, 49123427, 49123428, 49123429,
...
49123496, 49123497, 49123498, 49123499, 33554333, 33554334,
33554335, 33554336, 33554337])
In [270]: _.shape
Out[270]: (105,)
Since you are using savetxt
to write the numbers, look at it's fmt
parameter. The default is that scienctific notation.
With the correct fmt
you will get integers:
In [272]: arr=np.concatenate(numbers_list)
In [273]: np.savetxt('test.txt',arr,fmt='%d',delimiter=',')
In [274]: cat test.txt
49123400
49123401
49123402
49123403
49123404
Upvotes: 1