Reputation: 32008
I have a file with integer data, where the first few lines/columns are used for names.
I'd like to be able to use genfromtxt
or loadtxt
and still get numpy to read it as a homogenous array. To do this I used the options skiprows
and usecols
but it did not help.
In the (working) example below I'd expect print(test_array.shape)
to give (3,3) and print(test.array)
to give
[[0 0 0]
[0 1 0]
[1 0 0]]
Is there any way to achieve what I want without trimming the first rows/columns with a unix tool before attempting to load the file? Note that the actual files I want to load are B-I-G (~6 gigs) so any solution should not be too computationally intensive.
from __future__ import print_function
from StringIO import StringIO #use io.StringIO with py3
import numpy as np
example_file = StringIO("FID 1 2 3\n11464_ATCACG 0 0 0\n11465_CGATGT 0 1 0\n11466_TTAGGC 1 0 0")
test_array = np.loadtxt(example_file, skiprows=1, usecols=(1,), dtype=int)
print(test_array.shape) #(3,)
print(test_array) #[0 0 1]
Upvotes: 1
Views: 278
Reputation: 74
You can use the usecols
and skip_header
flags in np.genfromtxt
. Then it works fine as:
test_array = np.genfromtxt(example_file, skip_header=1, usecols=(1,2,3))
>>> print(test_array)
[[ 0. 0. 0.]
[ 0. 1. 0.]
[ 1. 0. 0.]]
Upvotes: 1