Reputation: 2841
Consider the following data:
61 1 1 15.04 14.96 13.17 9.29 13.96 9.87 13.67 10.25 10.83 12.58 18.50 15.04 61 1 2 14.71 16.88 10.83 6.50 12.62 7.67 11.50 10.04 9.79 9.67 17.54 13.83 61 1 3 18.50 16.88 12.33 10.13 11.17 6.17 11.25 8.04 8.50 7.67 12.75 12.71
The first three columns are year, month and day.
The remaining 12 columns are average windspeeds in knots at 12 locations in a country on that day.
What I want to do is lose the 2nd and 3rd column (index 1 and 2) so that I get the following data:
61 15.04 14.96 13.17 9.29 13.96 9.87 13.67 10.25 10.83 12.58 18.50 15.04 61 14.71 16.88 10.83 6.50 12.62 7.67 11.50 10.04 9.79 9.67 17.54 13.83 61 18.50 16.88 12.33 10.13 11.17 6.17 11.25 8.04 8.50 7.67 12.75 12.71
The following works but I dont like it as it wont scale if I had lots of columns (ie many locations) in the data.
import numpy as np
data = np.loadtxt('wind.data')
data_nomonth_noday = data[:,[0,3,4,5,6,7,8,9,10,11,12,13,14]]
Is it possible to achieve it without enumerating the column numbers ? Can I achieve it with slicing ?
Upvotes: 2
Views: 570
Reputation: 19
This should work:
import numpy as np
data = np.loadtxt('wind.data')
data_nomonth_noday = np.zeros((data.shape[0],data.shape[1]-2))
data_nomonth_noday[:,0] = data[:,0]
data_nomonth_noday[:,1:] = data[:,3:]
In my opinion this is more readable,flexible and intuitive than some of the other possible ways of doing this
Upvotes: 1
Reputation: 7353
If a
is your numpy
array and you want to drop the columns: 1,2
, you could do that using the following in a single line.
import numpy as np
delete_cols = [1,2] # list of column numbers to delete
a[:,list(set(np.arange(a.shape[-1])) - set(delete_cols))]
What you need here is proper indexing of the array a
.
# list_of_column_numbers = [0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
a[:, list_of_column_numbers]
You can make the
list_of_column_numbers
in one of the following ways:
# Method-1: Direct Declaration
list_of_column_numbers = [0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
# Method-2A: Using Set and Dropping Columns not Needed
# a.shape[-1] = 15
delete_cols = [1,2] # list of column numbers to delete
list_of_column_numbers = list(set(np.arange(a.shape[-1])) - set(delete_cols))
# Method-2B: Make list of column numbers
# a.shape[-1] = 15
list_of_column_numbers = [0] + np.arange(3,a.shape[-1]).tolist()
Upvotes: 0
Reputation: 231325
You can easily generate the indexing array with r_
.
In [165]: np.r_[0,3:15]
Out[165]: array([ 0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
under the covers it's just doing
In [166]: np.concatenate([[0],np.arange(3,15)])
Out[166]: array([ 0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
np.delete
, while convenient, ends up with a similar amount of work. Depending on the deletion index it will either concatenate pieces, or construct a selection mask.
Regardless of the method, the result is a new array, with a copy of the required data (not a view).
loadtxt
accepts as usecols
parameter that takes a similar column index array.
Upvotes: 2
Reputation: 476547
You can use np.delete
[numpy-doc] for that, and use a slice
object as parameter to remove:
>>> np.delete(data, slice(1, 3), 1)
array([[61. , 15.04, 14.96, 13.17, 9.29, 13.96, 9.87, 13.67, 10.25,
10.83, 12.58, 18.5 , 15.04],
[61. , 14.71, 16.88, 10.83, 6.5 , 12.62, 7.67, 11.5 , 10.04,
9.79, 9.67, 17.54, 13.83],
[61. , 18.5 , 16.88, 12.33, 10.13, 11.17, 6.17, 11.25, 8.04,
8.5 , 7.67, 12.75, 12.71]])
When you use slicing notation, under the hood you basically pass a slice
object. Indeed a[1:3]
is equivalent to a[slice(1,3)]
.
Furthermore the 1
here specifies the dimension over which we want to remove. Since we wish to remove data for the second dimension, we thus write 1
as third parameter.
Upvotes: 1