Reputation: 4352
I have a .csv
file with both string and integer - containing columns. I need to use numpy.loadtxt
method to import the matrix formed from the specified columns. How can I do that? Right now I am trying to do the following:
data = np.loadtxt(open(path_to_data, "rb"), delimiter=",", skiprows=1, usecols=[1:])
Basically trying to read all columns but first, but it is giving an error:
SyntaxError: invalid syntax
Because such syntax is not allowed: usecols=[1:]
Upvotes: 1
Views: 13539
Reputation: 3727
Instead of asking numpy.loadtxt()
to select columns, simply load them all and then slice the resulting array to remove the 1st column:
data = np.loadtxt(open(path_to_data, "rb"), delimiter=",", skiprows=1)[:,1:]
Then you don't need to know the value of n
i.e. the number of columns and everything is on one line.
If you have a problem with converting strings to floats, you can then convert the dtype
e.g.
data = data.astype('float64')
Upvotes: 1
Reputation: 231385
This is the syntax error:
In [153]: [1:]
File "<ipython-input-153-4bac19319341>", line 1
[1:]
^
SyntaxError: invalid syntax
It's not specific to loadtxt
.
Use
data = np.loadtxt(open(path_to_data, "rb"), delimiter=",", skiprows=1, usecols=np.arange(1,n))
where n
is the total number of columns.
usecols : int or sequence, optional
Which columns to read, with 0 being the first. For example,
``usecols = (1,4,5)`` will extract the 2nd, 5th and 6th columns.
The default, None, results in all columns being read.
If you don't know n
, and don't want to use a preliminary file read to determine it, genfromtxt
might be easier.
data = np.genfromtxt(..., delimiter=',', skiprows=1)
should load all columns, putting nan
where it can't convert the string into float. If those nan
are all in the first column, then
data = data[:,1:]
should give you all but the first column.
genfromtxt
is a little more forgiving when it comes to converting strings to floats.
Upvotes: 3