Reputation: 1

Why do I keep getting a nan set when coding?

I am trying to convert my csv file into a numpy array so I can manipulate the numbers and then graph them. I printed my csv file and got:

               ra              dec
0       15:09:11.8     -34:13:44.9
1       09:19:46.8   +33:44:58.452
2     05:15:43.488   +19:21:46.692
3     04:19:12.096    +55:52:43.32

.... there's more code (101 lines x 2 columns), but it is just numbers. I wanted to convert the ra and dec numbers from their current unit to degrees and I thought I could do this by making each column into a numpy array. But when I coded it:

import numpy as np
np_array = np.genfromtxt(r'C:\Users\nstev\Downloads\S190930t.csv',delimiter=".", skip_header=1, usecols=(4))
print(np_array)

I get:

nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
nan nan nan nan nan nan nan nan nan nan]

I keep changing my delimiter and I have changed it to a colon and got the same thing and a semicolon and plus sign and I got an error saying that it got 2 columns instead of 1. I do not know how to change it so that I do not get this set! Someone help please!

Upvotes: 0

Answers (2)

hpaulj

Reputation: 231665

With a copy-n-paste of your file sample:

In [208]: data = np.genfromtxt('stack59761369.csv',encoding=None,dtype=None,names=True)          
In [209]: data                                                                                   
Out[209]: 
array([('15:09:11.8', '-34:13:44.9'), ('09:19:46.8', '+33:44:58.452'),
       ('05:15:43.488', '+19:21:46.692'),
       ('04:19:12.096', '+55:52:43.32')],
      dtype=[('ra', '<U12'), ('dec', '<U13')])

with this dtype and names I get a structured array, 1d, with 2 fields.

In [210]: data['ra']                                                                             
Out[210]: 
array(['15:09:11.8', '09:19:46.8', '05:15:43.488', '04:19:12.096'],
      dtype='<U12')
In [211]: np.char.split(data['ra'],':')                                                          
Out[211]: 
array([list(['15', '09', '11.8']), list(['09', '19', '46.8']),
       list(['05', '15', '43.488']), list(['04', '19', '12.096'])],
      dtype=object)

this split gives an object dtype array with lists. They can be joined into one 2d array with vstack:

In [212]: np.vstack(np.char.split(data['ra'],':'))                                               
Out[212]: 
array([['15', '09', '11.8'],
       ['09', '19', '46.8'],
       ['05', '15', '43.488'],
       ['04', '19', '12.096']], dtype='<U6')

and converted to floats with:

In [213]: np.vstack(np.char.split(data['ra'],':')).astype(float)                                 
Out[213]: 
array([[15.   ,  9.   , 11.8  ],
       [ 9.   , 19.   , 46.8  ],
       [ 5.   , 15.   , 43.488],
       [ 4.   , 19.   , 12.096]])
In [214]: np.vstack(np.char.split(data['dec'],':')).astype(float)                                
Out[214]: 
array([[-34.   ,  13.   ,  44.9  ],
       [ 33.   ,  44.   ,  58.452],
       [ 19.   ,  21.   ,  46.692],
       [ 55.   ,  52.   ,  43.32 ]])

pandas

In [256]: df =  pd.read_csv('stack59761369.csv',delim_whitespace=True)                           
In [257]: df                                                                                     
Out[257]: 
             ra            dec
0    15:09:11.8    -34:13:44.9
1    09:19:46.8  +33:44:58.452
2  05:15:43.488  +19:21:46.692
3  04:19:12.096   +55:52:43.32
In [258]: df['ra'].str.split(':',expand=True).astype(float)                                      
Out[258]: 
      0     1       2
0  15.0   9.0  11.800
1   9.0  19.0  46.800
2   5.0  15.0  43.488
3   4.0  19.0  12.096
In [259]: df['dec'].str.split(':',expand=True).astype(float)                                     
Out[259]: 
      0     1       2
0 -34.0  13.0  44.900
1  33.0  44.0  58.452
2  19.0  21.0  46.692
3  55.0  52.0  43.320

direct line read

In [279]: lines = []                                                                             
In [280]: with open('stack59761369.csv') as f: 
     ...:     header=f.readline() 
     ...:     for row in f: 
     ...:         alist = row.split() 
     ...:         alist = [[float(i) for i in astr.split(':')] for astr in alist] 
     ...:         lines.append(alist) 
     ...:                                                                                        
In [281]: lines                                                                                  
Out[281]: 
[[[15.0, 9.0, 11.8], [-34.0, 13.0, 44.9]],
 [[9.0, 19.0, 46.8], [33.0, 44.0, 58.452]],
 [[5.0, 15.0, 43.488], [19.0, 21.0, 46.692]],
 [[4.0, 19.0, 12.096], [55.0, 52.0, 43.32]]]
In [282]: np.array(lines)                                                                        
Out[282]: 
array([[[ 15.   ,   9.   ,  11.8  ],
        [-34.   ,  13.   ,  44.9  ]],

       [[  9.   ,  19.   ,  46.8  ],
        [ 33.   ,  44.   ,  58.452]],

       [[  5.   ,  15.   ,  43.488],
        [ 19.   ,  21.   ,  46.692]],

       [[  4.   ,  19.   ,  12.096],
        [ 55.   ,  52.   ,  43.32 ]]])
In [283]: _.shape                                                                                
Out[283]: (4, 2, 3)

First dimension is the number of rows; second the 2 columns, third the 3 values in a column

conversion to degree

In [285]: _282@[1,1/60,1/360]                                                                    
Out[285]: 
array([[ 15.18277778, -33.65861111],
       [  9.44666667,  33.8957    ],
       [  5.3708    ,  19.4797    ],
       [  4.35026667,  55.987     ]])

oops, that -34 deg value is wrong; all terms of an element have to have the same sign.

correction

Identify the elements with a negative degree:

In [296]: mask = np.sign(_282[:,:,0])                                                            
In [297]: mask                                                                                   
Out[297]: 
array([[ 1., -1.],
       [ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.]])

adjust all 3 terms accordingly:

In [298]: x = np.abs(_282)*mask[:,:,None]                                                        
In [299]: x                                                                                      
Out[299]: 
array([[[ 15.   ,   9.   ,  11.8  ],
        [-34.   , -13.   , -44.9  ]],

       [[  9.   ,  19.   ,  46.8  ],
        [ 33.   ,  44.   ,  58.452]],

       [[  5.   ,  15.   ,  43.488],
        [ 19.   ,  21.   ,  46.692]],

       [[  4.   ,  19.   ,  12.096],
        [ 55.   ,  52.   ,  43.32 ]]])
In [300]: x@[1, 1/60, 1/360]                                                                     
Out[300]: 
array([[ 15.18277778, -34.34138889],
       [  9.44666667,  33.8957    ],
       [  5.3708    ,  19.4797    ],
       [  4.35026667,  55.987     ]])

Upvotes: 1

Daniel Haley

Reputation: 52888

The nan is probably NaN (Not a Number). Try setting the data type to None (dtype=None).

Also, try omitting delimiter. By default, any consecutive whitespaces act as delimiter.

Not sure what you're expecting, but maybe this will be a better starting point...

import numpy as np

np_array = np.genfromtxt(r"C:\Users\nstev\Downloads\S190930t.csv", skip_header=1, dtype=None, encoding="utf-8", usecols=(1, 2))
print(np_array)

printed output...

[['15:09:11.8' '-34:13:44.9']
 ['09:19:46.8' '+33:44:58.452']
 ['05:15:43.488' '+19:21:46.692']
 ['04:19:12.096' '+55:52:43.32']]

Disclaimer: I don't use numpy. I based my answer on https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html