Marcom
Marcom

Reputation: 4751

genfromtxt returning NaN rows

I am trying to read a csv file with numpy and I have the following code

from numpy import genfromtxt
data = genfromtxt(open('errerr.csv', "r"), names=True, delimiter=',')

and the following comes out

  (nan, nan, nan, nan, nan, nan, nan),
       (nan, nan, nan, nan, nan, nan, nan),
       (nan, nan, nan, nan, nan, nan, nan)], 
      dtype=[('name', '<f8'), ('severity', '<f8'), ('Message', '<f8'), ('AppDomainName', '<f8'), ('ProcessName', '<f8'), ('clientid', '<f8'), ('type', '<f8')])

dtype looks fine

and just to prove I'm not going crazy I tried this code

import csv
f = open('errors.csv', 'rt')
reader = csv.reader(f)
data = [] 
for r in reader: 
    data.append(r)
f.close()

which works great, but im trying to figure out whats the deal with genfromtxt

here is a sample from the csv

name,severity,Message,AppDomainName,ProcessName,clientid,type
 Strings strings,Error,")  Thread Name:  Extended Properties:",SunDSrvc.exe,C:\Program Files\\SunDSrvc.exe,5DAA9377 ,Client
 Strings strings,Error,")  Thread Name:  Extended Properties:",SunDSrvc.exe,C:\Program Files\\SunDSrvc.exe,5DAA9377 ,Client
 Strings strings,Error,")  Thread Name:  Extended Properties:",SunDSrvc.exe,C:\Program Files\\SunDSrvc.exe,5DAA9377 ,Client

Upvotes: 12

Views: 36881

Answers (3)

Tony Arnold
Tony Arnold

Reputation: 9

I had exactly the same problem reading data saved in csv format from Excel. It drove me nuts for a few hours until I found this. In Excel the first csv format on the save menu is the UTF-8 with BOM format - This produces produces a nan error in the first cell. If you save the file using other csv formats on the menu such as: CSV(Comma delimited),CSV(Macintosh),CSV(MS-DOS), the genfromtxt function works without nan errors.

Upvotes: 0

ArgiesDario
ArgiesDario

Reputation: 73

You should also add encoding=None to avoid having the Deprecated Warning:

VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.

Your line should be like:

np.genfromtxt(txt, delimiter=',', names=True, dtype=None, encoding=None)

Upvotes: 3

hpaulj
hpaulj

Reputation: 231665

Your dtype isn't fine. It's specifying '<f8', a float, for each of the fields. You want strings. Try dtype=None:

 np.genfromtxt(txt,delimiter=',',names=True,dtype=None)

which produces:

array([ ('Strings strings', 'Error', '")  Thread Name:  Extended Properties:"', 'SunDSrvc.exe', 'C:\\Program Files\\SunDSrvc.exe', '5DAA9377 ', 'Client'),
       ('Strings strings', 'Error', '")  Thread Name:  Extended Properties:"', 'SunDSrvc.exe', 'C:\\Program Files\\SunDSrvc.exe', '5DAA9377 ', 'Client'),
       ('Strings strings', 'Error', '")  Thread Name:  Extended Properties:"', 'SunDSrvc.exe', 'C:\\Program Files\\SunDSrvc.exe', '5DAA9377 ', 'Client')], 
      dtype=[('name', 'S15'), ('severity', 'S5'), ('Message', 'S39'), ('AppDomainName', 'S12'), ('ProcessName', 'S29'), ('clientid', 'S9'), ('type', 'S6')])

(I have removed extraneous stuff about delimiters within quotes)

Upvotes: 18

Related Questions