How do I directly cast a numpy array from a csv file using np.genfromtxt?

Question

I am attempting to generate a numpy array directly from the csv file. I read up online that you can do this using the np.genfromtxt function of the module. I attempted this but all my values were nan. The dataset is from kaggle about nyc taxis.

import numpy as np

taxi = np.genfromtxt("nyc_taxis.csv", delimiter=";", skip_header=1)

print(taxi)

The output is :

array([nan, nan, nan, ..., nan, nan, nan])

I am trying to practice efficiency and use as few lines of code in this project as possible.

I have also attempted to use the np.loadtxt() function but it returned this error

taxi = np.loadtxt("nyc_taxis.csv", delimiter=";", dtype =np.float, skiprows=1)


ValueError                                Traceback (most recent call last)
 in 
----> 1 taxi = np.loadtxt("nyc_taxis.csv", delimiter=";", dtype =np.float, skiprows=1)
      2 
      3 taxi

~\anaconda3\lib\site-packages
umpy\lib
pyio.py in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin, encoding, max_rows)
   1157         # converting the data
   1158         X = None
-> 1159         for x in read_data(_loadtxt_chunksize):
   1160             if X is None:
   1161                 X = np.array(x, dtype)

~\anaconda3\lib\site-packages
umpy\lib
pyio.py in read_data(chunk_size)
   1085 
   1086             # Convert each value according to its column and store
-> 1087             items = [conv(val) for (conv, val) in zip(converters, vals)]
   1088 
   1089             # Then pack it according to the dtype's nesting

~\anaconda3\lib\site-packages
umpy\lib
pyio.py in (.0)
   1085 
   1086             # Convert each value according to its column and store
-> 1087             items = [conv(val) for (conv, val) in zip(converters, vals)]
   1088 
   1089             # Then pack it according to the dtype's nesting

~\anaconda3\lib\site-packages
umpy\lib
pyio.py in floatconv(x)
    792         if '0x' in x:
    793             return float.fromhex(x)
--> 794         return float(x)
    795 
    796     typ = dtype.type

ValueError: could not convert string to float: '2016,1,1,5,0,2,4,21.00,2037,52.00,0.80,5.54,11.65,69.99,1'

Any and all help is appreciated.

Wt.N · Accepted Answer

You have to set delimiter=',', as your file is comma separated.
You have to set dype for string columns manually, otherwise it becomes nan (Only columns specified its dtype is loaded, so I set all columns' dtype).

I downloaded test.csv from Kaggle: New York City Taxi Trip Duration:

nyc.py

import numpy as np
dtype=[
    ('id', 'S16'),
    ('vender_id', np.uint8),
    ('pickup_datetime', 'S16'),
    ('passenger_count', np.uint8),
    ('pickup_longitude', np.float32),
    ('pickup_latitude', np.float32),
    ('dropoff_longitude', np.float32),
    ('dropoff_latitude', np.float32),
    ('store_and_fwd_flag', 'S8'),
]
csv = np.genfromtxt('test.csv', delimiter=',', skip_header=1, dtype=dtype)
print(csv[:3])

, where 'S8' means string 8 bytes, outputs

❯ python nyc.py
[(b'id3004672', 1, b'2016-06-30 23:59', 1, -73.98813, 40.73203 , -73.99017, 40.75668 , b'N')
 (b'id3505355', 1, b'2016-06-30 23:59', 1, -73.9642 , 40.679993, -73.95981, 40.655403, b'N')
 (b'id1217141', 1, b'2016-06-30 23:59', 1, -73.99744, 40.737583, -73.98616, 40.729523, b'N')]

P.S. I recommend pandas.

https://numpy.org/doc/stable/reference/generated/numpy.genfromtxt.html

How do I directly cast a numpy array from a csv file using np.genfromtxt?

Answers (1)

Related Questions