Reputation:
I have an array of 2D, called X and a 1D array for X's classes, what i want to do is slice the same amount of first N percent elements for each class and store inside a new array, for example, in a simple way without doing for loops:
For the following X array which is 2D:
[[0.612515 0.385088 ]
[0.213345 0.174123 ]
[0.432596 0.8714246]
[0.700230 0.730789 ]
[0.455105 0.128509 ]
[0.518423 0.295175 ]
[0.659871 0.320614 ]
[0.459677 0.940614 ]
[0.823733 0.831789 ]
[0.236175 0.10750 ]
[0.379032 0.241121 ]
[0.512535 0.8522193]
Output is 3.
Then, i'd like to store the first 3 index that belongs to class 0 and first 3 elements that belongs to class 0 and maintain the occurence order of the indices, the following output:
First 3 from each class: [1 0 0 1 0 1]
New_X =
[[0.612515 0.385088 ]
[0.213345 0.174123 ]
[0.432596 0.8714246]
[0.700230 0.730789 ]
[0.455105 0.128509 ]
[0.518423 0.295175 ]]
Upvotes: 1
Views: 51
Reputation: 8308
First, 30% is only 2 elements from each class (even when using np.ceil
).
Second, I'll assume both arrays are numpy.array
.
Given the 2 arrays, we can find the desired indices using np.where
and array y
in the following way:
in_ = sorted([x for x in [*np.where(y==0)[0][:np.ceil(0.3*6).astype(int)],*np.where(y==1)[0][:np.ceil(0.3*6).astype(int)]]]) # [0, 1, 2, 3]
Now we can simply slice X
like so:
X[in_]
# array([[0.612515 , 0.385088 ],
# [0.213345 , 0.174123 ],
# [0.432596 , 0.8714246],
# [0.70023 , 0.730789 ]])
The definition of X
and y
are:
X = np.array([[0.612515 , 0.385088 ],
[0.213345 , 0.174123 ],
[0.432596 , 0.8714246],
[0.70023 , 0.730789 ],
[0.455105 , 0.128509 ],
[0.518423 , 0.295175 ],
[0.659871 , 0.320614 ],
[0.459677 , 0.940614 ],
[0.823733 , 0.831789 ],
[0.236175 , 0.1075 ],
[0.379032 , 0.241121 ],
[0.512535 , 0.8522193]])
y = np.array([1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0])
The following line: np.where(y==0)[0][:np.ceil(0.3*6).astype(int)]
doing the following:
np.where(y==0)[0]
- returns all the indices where y==0
[:np.ceil(0.3*6).astype(int)]
Upvotes: 1