Reputation: 71
I am new to numpy and I am trying to generate an array from a CSV file. I was informed that the .genfromtxt method works well in generating an array and automatically detecting and ascribing dtypes. The formula seemingly did this without flaws until I checked the shape of the array.
import numpy as np
taxi = np.genfromtxt("nyc_taxis.csv", delimiter=",", dtype = None, names = True)
taxi.shape
[out]: (89560,)
I believe this shows me that my dataset is now a 1D array. The tutorial I am working on in class has a final result of taxi.shape as (89560,15) but they used a long, tedious for loop, then converted certain columns to floats. But I want to try learn a more efficient way.
The first few lines of the array are
array([(2016, 1, 1, 5, 0, 2, 4, 21. , 2037, 52. , 0.8, 5.54, 11.65, 69.99, 1),
(2016, 1, 1, 5, 0, 2, 1, 16.29, 1520, 45. , 1.3, 0. , 8. , 54.3 , 1),
(2016, 1, 1, 5, 0, 2, 6, 12.7 , 1462, 36.5, 1.3, 0. , 0. , 37.8 , 2),
(2016, 1, 1, 5, 0, 2, 6, 8.7 , 1210, 26. , 1.3, 0. , 5.46, 32.76, 1),
(2016, 1, 1, 5, 0, 2, 6, 5.56, 759, 17.5, 1.3, 0. , 0. , 18.8 , 2),
(2016, 1, 1, 5, 0, 4, 2, 21.45, 2004, 52. , 0.8, 0. , 52.8 , 105.6 , 1),
(2016, 1, 1, 5, 0, 2, 6, 8.45, 927, 24.5, 1.3, 0. , 6.45, 32.25, 1),
(2016, 1, 1, 5, 0, 2, 6, 7.3 , 731, 21.5, 1.3, 0. , 0. , 22.8 , 2),
(2016, 1, 1, 5, 0, 2, 5, 36.3 , 2562, 109.5, 0.8, 11.08, 10. , 131.38, 1),
(2016, 1, 1, 5, 0, 6, 2, 12.46, 1351, 36. , 1.3, 0. , 0. , 37.3 , 2)],
So I can see from the results that each row has 15 comma-separations (i.e 15 columns) but the shape tells me that it is only 89560 rows and no columns. Am I reading this wrong? Is there a way that I can transform the shape of my taxi array dataset to reflect the true number of columns (i.e 15) as they are in the csv file?
Any and all help is appreciated
Upvotes: 0
Views: 275
Reputation: 12407
You can use this function to convert your structured to unstructured with your desired data type (assuming all fields are of the same data type, if not, keeping it as structured is better):
import numpy.lib.recfunctions as rfn
taxi = rfn.structured_to_unstructured(taxi, dtype=np.float)
Upvotes: 1