Reputation: 71
I am attempting to generate a numpy array directly from the csv file. I read up online that you can do this using the np.genfromtxt function of the module. I attempted this but all my values were nan. The dataset is from kaggle about nyc taxis.
import numpy as np
taxi = np.genfromtxt("nyc_taxis.csv", delimiter=";", skip_header=1)
print(taxi)
The output is :
array([nan, nan, nan, ..., nan, nan, nan])
I am trying to practice efficiency and use as few lines of code in this project as possible.
I have also attempted to use the np.loadtxt() function but it returned this error
taxi = np.loadtxt("nyc_taxis.csv", delimiter=";", dtype =np.float, skiprows=1)
ValueError Traceback (most recent call last)
<ipython-input-21-8c3c6082acc0> in <module>
----> 1 taxi = np.loadtxt("nyc_taxis.csv", delimiter=";", dtype =np.float, skiprows=1)
2
3 taxi
~\anaconda3\lib\site-packages\numpy\lib\npyio.py in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin, encoding, max_rows)
1157 # converting the data
1158 X = None
-> 1159 for x in read_data(_loadtxt_chunksize):
1160 if X is None:
1161 X = np.array(x, dtype)
~\anaconda3\lib\site-packages\numpy\lib\npyio.py in read_data(chunk_size)
1085
1086 # Convert each value according to its column and store
-> 1087 items = [conv(val) for (conv, val) in zip(converters, vals)]
1088
1089 # Then pack it according to the dtype's nesting
~\anaconda3\lib\site-packages\numpy\lib\npyio.py in <listcomp>(.0)
1085
1086 # Convert each value according to its column and store
-> 1087 items = [conv(val) for (conv, val) in zip(converters, vals)]
1088
1089 # Then pack it according to the dtype's nesting
~\anaconda3\lib\site-packages\numpy\lib\npyio.py in floatconv(x)
792 if '0x' in x:
793 return float.fromhex(x)
--> 794 return float(x)
795
796 typ = dtype.type
ValueError: could not convert string to float: '2016,1,1,5,0,2,4,21.00,2037,52.00,0.80,5.54,11.65,69.99,1'
Any and all help is appreciated.
Upvotes: 0
Views: 1750
Reputation: 1658
I downloaded test.csv from Kaggle: New York City Taxi Trip Duration:
nyc.py
import numpy as np
dtype=[
('id', 'S16'),
('vender_id', np.uint8),
('pickup_datetime', 'S16'),
('passenger_count', np.uint8),
('pickup_longitude', np.float32),
('pickup_latitude', np.float32),
('dropoff_longitude', np.float32),
('dropoff_latitude', np.float32),
('store_and_fwd_flag', 'S8'),
]
csv = np.genfromtxt('test.csv', delimiter=',', skip_header=1, dtype=dtype)
print(csv[:3])
, where 'S8' means string 8 bytes, outputs
❯ python nyc.py
[(b'id3004672', 1, b'2016-06-30 23:59', 1, -73.98813, 40.73203 , -73.99017, 40.75668 , b'N')
(b'id3505355', 1, b'2016-06-30 23:59', 1, -73.9642 , 40.679993, -73.95981, 40.655403, b'N')
(b'id1217141', 1, b'2016-06-30 23:59', 1, -73.99744, 40.737583, -73.98616, 40.729523, b'N')]
P.S. I recommend pandas.
https://numpy.org/doc/stable/reference/generated/numpy.genfromtxt.html
Upvotes: 1