Reputation: 1337
Suppose I have the following numpy vector
[[1, 3., 'John Doe', 'male', 'doc', '25'],
...,
[9, 6., 'Jane Doe', 'female', 'p', '28']]
I need to extract relevant to my task data.
Being a novice in numpy and python in general, I would do it in the following manner:
data = np.array(
[[1, 3., 'John Doe', 'male', 'doc', 25],
[9, 6., 'Jane Doe', 'female', 'p', 28]]
)
data_tr = np.zeros((data.shape[0], 3))
for i in range(0, data.shape[0]):
data_tr[i][0] = data[i, 1]
data_tr[i][1] = 0 if data[i, 3] == 'male' else 1
data_tr[i][2] = data[i, 5]
And as a result I have the following:
[[ 3., 0., 25.],
[ 6., 1., 28.]]
What I would like to know is if there is a more efficient or cleaner way to perform that.
Can anybody please help me with that?
Upvotes: 6
Views: 78
Reputation: 221624
One approach with column-indexing
-
data_tr = np.zeros((data.shape[0], 3))
data_tr[:,[0,2]] = data[:, [1,5]]
data_tr[:,1] = data[:,3]=='male'
Note that the step : data_tr[:,[0,2]] = data[:, [1,5]]
is working with copies off the respective columns. Those are not very efficient for assignments and extraction. So, you might want to do that in two separate steps, mostly for performance, like so -
data_tr[:,0] = data[:, 1]
data_tr[:,2] = data[:, 5]
Upvotes: 4