np.genfromtxt returns a list, not an array

Question

I'm using np.genfromtxt() to read a series of comma delimited text files and load into NumPy arrays for downstream processing (and eventually writing to HDF5).
The code works fine (returns an array) when there are 4 (or more) lines (1 header, 2+ data lines, 1 footer). A check of the array.shape after reading 4 lines gives (2, ). (first and last lines are not read)

I don't understand what is returned when I only have 3 lines (1 header, 1 data line, 1 footer). A check of the array.shape gives () and when I print the array, are no brackets []. I think it's a list. What do I need to do to get an array when np.genfromtxt() only finds one line of data?

I created an example to mimic the behavior with 2 simple files. (Data and Output follow the source code). Notes: The field names and data type are defined with np.dtype. I use skip_header=1, skip_footer=1 to skip the first and last lines, and usecols=() to only read some columns.

import numpy as np
import glob
dsp_dt = np.dtype ( [('H','S2'), ('YYMMDD',int),  
          ('NAME','S40'), ('COUNT',int)] )

for dsp_name in glob.glob('data_2019-10-*.txt'):
    print(dsp_name)

    dsp_recarr = np.genfromtxt(dsp_name, delimiter=',', dtype=dsp_dt, 
                               skip_header=1, skip_footer=1, usecols=(1,2,3),
                               names=None, encoding=None)
    print(dsp_recarr.dtype)
    print(dsp_recarr.shape)
    print(dsp_recarr)

File:data_2019-10-01.txt

H,YYMMDD,NAME,COUNT
S,191001,NAME_1,13
S,191001,Overall,13
F,191001

File:data_2019-10-02.txt

H,YYMMDD,NAME,COUNT
D,191002,NODATA,0
F,191002

Output:

data_2019-10-01.txt
[('YYMMDD', '

hpaulj · Accepted Answer

In [92]: dsp_dt = np.dtype ( [('H','S2'), ('YYMMDD',int),   
    ...:           ('NAME','S40'), ('COUNT',int)] )                                              
In [93]: txt="""H,YYMMDD,NAME,COUNT 
    ...: S,191001,NAME_1,13 
    ...: S,191001,Overall,13 
    ...: F,191001"""                                                                             
In [94]:                                                                                         
In [94]: dsp_recarr = np.genfromtxt(txt.splitlines(), delimiter=',', dtype=dsp_dt,  
    ...:                                skip_header=1, skip_footer=1, usecols=(1,2,3), 
    ...:                                names=None, encoding=None)                               
In [95]: dsp_recarr                                                                              
Out[95]: 
array([(191001, b'NAME_1', 13), (191001, b'Overall', 13)],
      dtype=[('YYMMDD', '



With only one data line:

In [97]: dsp_recarr = np.genfromtxt(txt.splitlines(), delimiter=',', dtype=dsp_dt,  
    ...:                                skip_header=1, skip_footer=2, usecols=(1,2,3), 
    ...:                                names=None, encoding=None)                               
In [98]: dsp_recarr                                                                              
Out[98]: 
array((191001, b'NAME_1', 13),
      dtype=[('YYMMDD', '


loadtxt has a ndim, I don't see the equivalent in genfromtxt.

With reshaping:

In [107]: dsp_recarr.reshape(1)                                                                  
Out[107]: 
array([(191001, b'NAME_1', 13)],
      dtype=[('YYMMDD', '

np.genfromtxt returns a list, not an array

Answers (1)

Related Questions