Reputation: 773
I need to use NumPy (and only NumPy -- not Pandas or SkLearn, etc) to read in a CSV file. The CSV file contains elements that look as follows:
PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
I am reading and printing the data as follows:
dataset = np.genfromtxt(dataset_path, delimiter=',', names=True, skip_header=1)
print(titanic_dataset)
The file is being read in, but when looking at the output, the string information is missing (appears as nan
:
[( 2., 1., 1., nan, nan, nan, 38. , 1., 0., nan, 71.2833, nan, nan)
( 3., 1., 3., nan, nan, nan, 26. , 0., 0., nan, 7.925 , nan, nan)
( 4., 1., 1., nan, nan, nan, 35. , 1., 0., 1.138030e+05, 53.1 , nan, nan)
( 5., 0., 3., nan, nan, nan, 35. , 0., 0., 3.734500e+05, 8.05 , nan, nan)
( 6., 0., 3., nan, nan, nan, nan, 0., 0., 3.308770e+05, 8.4583, nan, nan)]
How can I read this csv file, keeping the comma as the delimiter and also read in the string values?
Upvotes: 1
Views: 368
Reputation: 691
For consistent number of columns and mixed datatype use :
import numpy as np
np.genfromtxt('filename', dtype= None, delimiter=",")
dtype = none
results in a recarry. so to access the field you must use the attributes.
Upvotes: 2