Reputation: 629
I have a csv file, which has the first three columns like this
2011,12,25,...
2011,12,26....
2011,12,27,...
...
These columns are basically year, month and date. The other columns contain strings. There are 100 rows and 6 columns in total. I use numpy.loadtxt
to get this into an array, using
input = numpy.loadtxt('file.csv', dtype='i4, i4, i4, S4, S4, S4', delimiter=',')
Problem: As I understand, this loadtxt operation should should return an array which has a shape 100x6
. However this returns an array of 100x1
, with each element being an array of 1x6
.
I want this to be normal 2D array of 100x6
. I looked up some resources on the net. It seems that since some of the columns in the csv data contains strings, I have to use the dtype
argument, and that results in the input being a 1D array of arrays rather than a 2D array. I have tried some of the examples given in these sites, and they seem to work fine as long as all the entries in the CSV file are numbers
What I am looking for is either
Sample CSV file:
2011,12,25,AAA,AAA,AAA
2011,12,26,BBB,BBB,BBB
2011,12,27,CCC,CCC,CCC
Upvotes: 2
Views: 3763
Reputation: 60117
You are right that np.loadtxt
returns a 1D array, but you can still access the 'columns', which are actually fields in a structured array:
array([(2011, 12, 25, b'AAA', b'AAA', b'AAA'),
(2011, 12, 26, b'BBB', b'BBB', b'BBB'),
(2011, 12, 27, b'CCC', b'CCC', b'CCC')],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('f3', 'S4'), ('f4', 'S4'), ('f5', 'S4')])
It does let you index the fields, but you need to do so by the names (f0
, f1
, f2
...) and not indexes:
nt['f3']
#>>> array([b'AAA', b'BBB', b'CCC'],
#>>> dtype='|S4')
You can of course specify the dtype
names:
dtype=[('MEAT', '<i4'), ('CHEESE', '<i4'), ('TOAST', '<i4'), ('BIRD', 'S4'), ('PLANE', 'S4'), ('SOCK', 'S4')]
nt = numpy.loadtxt('/home/joshua/file.csv', dtype=dtype, delimiter=',')
nt['SOCK']
#>>> array([b'AAA', b'BBB', b'CCC'],
#>>> dtype='|S4')
This is done to simplify a lot of complications that arise from non-homogeneous arrays.
Upvotes: 3