Vismark Juarez
Vismark Juarez

Reputation: 773

How to use NumPy to read in a CSV file containing strings and float values into a 2-D array

I need to use NumPy (and only NumPy -- not Pandas or SkLearn, etc) to read in a CSV file. The CSV file contains elements that look as follows:

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S

I am reading and printing the data as follows:

dataset = np.genfromtxt(dataset_path, delimiter=',', names=True, skip_header=1)
print(titanic_dataset)

The file is being read in, but when looking at the output, the string information is missing (appears as nan:

[(  2., 1., 1., nan, nan, nan, 38.  , 1., 0.,          nan,  71.2833, nan, nan)
 (  3., 1., 3., nan, nan, nan, 26.  , 0., 0.,          nan,   7.925 , nan, nan)
 (  4., 1., 1., nan, nan, nan, 35.  , 1., 0., 1.138030e+05,  53.1   , nan, nan)
 (  5., 0., 3., nan, nan, nan, 35.  , 0., 0., 3.734500e+05,   8.05  , nan, nan)
 (  6., 0., 3., nan, nan, nan,   nan, 0., 0., 3.308770e+05,   8.4583, nan, nan)]

How can I read this csv file, keeping the comma as the delimiter and also read in the string values?

Upvotes: 1

Views: 368

Answers (1)

Aagam Sheth
Aagam Sheth

Reputation: 691

For consistent number of columns and mixed datatype use :

import numpy as np
np.genfromtxt('filename', dtype= None, delimiter=",")

dtype = none results in a recarry. so to access the field you must use the attributes.

Upvotes: 2

Related Questions