How to load csv file containing strings and numbers using genfromtxt?

Question

I'm trying to load a csv file in a NumPy array for machine learning purpose. Until now I always worked with int or float data but my current csv contains string, float and int so I have some trouble with dtype argument. My datasets has 41188 samples and 8 features, e.g.:

47;"university.degree";"yes";176;1;93.994;-36.4;4.857;"no"

I know that if I specify dtype=None the types will be determined by the contents of each columns:

data = np.genfromtxt(filename, dtype=None, delimiter=";", skip_header=1)

but it apparently doesn't work. First of all, the result of genfromtxt is a numpy ndarray with the following shape:

In [2]: data.shape
Out[2]: (41188,)

while I expect (41188,8)

Instead, If I use the default dtype:

data2 = np.genfromtxt(filename, delimiter=";", skip_header=1)

I obtain the following shape of data:

In [4]: data2.shape
Out[4]: (41188,8)

Secondly, with dtype=None I obtain the following deprecation warning:

VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.

That I can fix by using (is it correct?):

encoding='ASCII'

I have 2 questions:

How can I set the correct type of each columns?
Why I have to set the encoding?

How to load csv file containing strings and numbers using genfromtxt?

Answers (1)

Related Questions