Reputation: 7451
I have a 2D Python array, from which I would like to remove certain columns, but I don't know how many I would like to remove until the code runs.
I want to loop over the columns in the original array, and if the sum of the rows in any one column is about a certain value I want to remove the whole column.
I started to do this the following way:
for i in range(original_number_of_columns)
if sum(original_array[:,i]) < certain_value:
new_array[:,new_index] = original_array[:,i]
new_index+=1
But then I realised that I was going to have to define new_array first, and tell Python what size it is. But I don't know what size it is going to be beforehand.
I have got around it before by firstly looping over the columns to find out how many I will lose, then defining the new_array, and then lastly running the loop above - but obviously there will be much more efficient ways to do such things!
Thank you.
Upvotes: 1
Views: 1707
Reputation: 113950
without numpy
my_2d_table = [[...],[...],...]
only_cols_that_sum_lt_x = [col for col in zip(*my_2d_table) if sum(col) < some_threshold]
new_table = map(list,zip(*only_cols_that_sum_lt_x))
with numpy
a = np.array(my_2d_table)
a[:,np.sum(a,0) < some_target]
Upvotes: 3
Reputation: 21079
I suggest using numpy.compress
.
>>> import numpy as np
>>> a = np.array([[1, 2, 3], [1, -3, 2], [4, 5, 7]])
>>> a
array([[ 1, 2, 3],
[ 1, -3, 2],
[ 4, 5, 7]])
>>> a.sum(axis=0) # sums each column
array([ 6, 4, 12])
>>> a.sum(0) < 5
array([ False, True, False], dtype=bool)
>>> a.compress(a.sum(0) < 5, axis=1) # applies the condition to the elements of each row so that only those elements in the rows whose column indices correspond to True values in the condition array will be kept
array([[ 2],
[-3],
[ 5]])
Upvotes: 2
Reputation: 142136
You can use the following:
import numpy as np
a = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
)
print a.compress(a.sum(0) > 15, 1)
[[3]
[6]
[9]]
Upvotes: 3