Dmitry Volkov
Dmitry Volkov

Reputation: 1337

Extracting and transforming data in numpy

Suppose I have the following numpy vector

[[1, 3., 'John Doe', 'male', 'doc', '25'],
  ...,
 [9, 6., 'Jane Doe', 'female', 'p', '28']]

I need to extract relevant to my task data.

Being a novice in numpy and python in general, I would do it in the following manner:

data = np.array(
[[1, 3., 'John Doe', 'male', 'doc', 25],
 [9, 6., 'Jane Doe', 'female', 'p', 28]]
)

data_tr = np.zeros((data.shape[0], 3))
for i in range(0, data.shape[0]):
    data_tr[i][0] = data[i, 1]
    data_tr[i][1] = 0 if data[i, 3] == 'male' else 1
    data_tr[i][2] = data[i, 5]

And as a result I have the following:

[[  3.,   0.,  25.],
 [  6.,   1.,  28.]]

What I would like to know is if there is a more efficient or cleaner way to perform that.
Can anybody please help me with that?

Upvotes: 6

Views: 78

Answers (1)

Divakar
Divakar

Reputation: 221624

One approach with column-indexing -

data_tr = np.zeros((data.shape[0], 3))
data_tr[:,[0,2]] = data[:, [1,5]]
data_tr[:,1] = data[:,3]=='male'

Note that the step : data_tr[:,[0,2]] = data[:, [1,5]] is working with copies off the respective columns. Those are not very efficient for assignments and extraction. So, you might want to do that in two separate steps, mostly for performance, like so -

data_tr[:,0] = data[:, 1]
data_tr[:,2] = data[:, 5]

Upvotes: 4

Related Questions