Reputation: 1471
I need to return the number of non-reasonable (nan or out of range) values for the 3rd column where has 0s an a blank in it. I have to deal with a csv file in a real problem but I just created a ndarray for now.
data = np.array([[ 1, 2000, 143, 4546], [ 2, 1999, 246, 0], [ 3, 2008, 190, ], [ 4, 2000, 100, 0]])
I cant even think where I should start.
It will be awesome if someone can help.
Upvotes: 0
Views: 57
Reputation: 2164
First, you need to be able to access just the column that you're interested in. Do this with a slice:
data[:,2] # grab all rows, and just the column with index 2
Now you want to count the occurrences that are NaN
:
np.count_nonzero(np.isnan(data[:,2]))
And we want to count the number of zero elements:
data[:,2].size - np.count_nonzero(data[:,2])
And if we add those together:
data[:,2].size - np.count_nonzero(data[:,2]) + np.count_nonzero(np.isnan(data[:,2]))
This is boring, though, since the 3rd column doesn't have any 0
or NaN
in it. Lets try with the last column:
>>> slice = data[:,3]
>>> slice.size - np.count_nonzero(slice) + np.count_nonzero(np.isnan(slice))
3
edit I should explain why this works:
np.isnan(data[:,2])
gives an array of True
and False
based on if it's a NaN
or not. True
, when treated as a number, is converted to 1
and False is converted to
0so the
np.count_nonzerocall counts the number of
1which represent the
NaN` values.
np.count_nonzero(data[:,2])
counts the number of non-zero directly. If we subtract the number of non-zero elements from the total number of elements, we'll get the number of 0
s.
Upvotes: 1