get genfromtxt/loadtxt to ignore datatype in ignored columns/rows

Question

I have a file with integer data, where the first few lines/columns are used for names.

I'd like to be able to use genfromtxt or loadtxt and still get numpy to read it as a homogenous array. To do this I used the options skiprows and usecols but it did not help. In the (working) example below I'd expect print(test_array.shape) to give (3,3) and print(test.array) to give

[[0 0 0]
 [0 1 0]
 [1 0 0]]

Is there any way to achieve what I want without trimming the first rows/columns with a unix tool before attempting to load the file? Note that the actual files I want to load are B-I-G (~6 gigs) so any solution should not be too computationally intensive.

from __future__ import print_function
from StringIO import StringIO #use io.StringIO with py3
import numpy as np

example_file = StringIO("FID 1 2 3
11464_ATCACG 0 0 0
11465_CGATGT 0 1 0
11466_TTAGGC 1 0 0")
test_array = np.loadtxt(example_file, skiprows=1, usecols=(1,), dtype=int)

print(test_array.shape) #(3,)
print(test_array) #[0 0 1]

redrivercrayon · Accepted Answer

You can use the usecols and skip_header flags in np.genfromtxt. Then it works fine as:

test_array = np.genfromtxt(example_file, skip_header=1, usecols=(1,2,3))
>>> print(test_array)
[[ 0.  0.  0.]
 [ 0.  1.  0.]
 [ 1.  0.  0.]]

get genfromtxt/loadtxt to ignore datatype in ignored columns/rows

Answers (1)

Related Questions