user3575732
user3575732

Reputation: 23

numpy returns 1d array and 2d array for same code

I am not really aware of what rules does numpy follows when performing some 2d array operations with regards to returning the result as a 1d or 2d array. Let us consider the following piece of code

idx_cls_samples = sample_data[:, -1] == c
v_feature = sample_data[idx_cls_samples, f]

f_values = sample_data[[sample_data[:, -1] == c], f]

Note that the last line is simply the first two lines combined into one.

The result of first two lines is a numpy vector of the form array([1, 2, 3, ...]) and the result of last line is array([[1, 2, 3, ...]]) and I believe the result should have been array([1], [2], [3], ...]) in both cases. How can I figure out beforehand what format will numpy choose to return the result?

Upvotes: 1

Views: 135

Answers (2)

hpaulj
hpaulj

Reputation: 231540

sample_data is 2d. sample_data[:,-1] is 1d, the last column. Indexing with a scalar removes a dimension.

The ...=c produces a boolean of the same dimension (1d).

sample_data[:, f] is also a 1d, the fth column.

Indexing that with a boolean array returns a result of the same dimension of the boolean, but just a subset of the values

sample_data[idx, f] is 1d, sample_data[[idx], f] is 2d (due to the added []).

You probably wanted, sample_data[(sample_data[:, -1] == c), f], where () just groups the strings, sometimes for operator precedence, sometimes just to make more readable. (but beware of (...,), which makes a tuple).

sample_data[idx, [f]] would have given you the column 'vector', 2d with 1 column.

Another way to look at sample_data[idx,f] is: idx selects a subset of rows, f selects a column from that 2d.

Often 2d (or higher nd) indexing can be studied axis by axis; that's especially true with an index is scalar, or a slice. It's more complicated if an index is a list or array, or worse, 2 or more of those.

Upvotes: 0

user2357112
user2357112

Reputation: 281584

Note that the last line is simply the first two lines combined into one.

No it's not. You stuck an extra pair of brackets in there:

f_values = sample_data[[sample_data[:, -1] == c], f]
#                      ^                       ^

Take them out.

As for the indexing rules, those are in the documentation. They're pretty long.

Upvotes: 2

Related Questions