Reputation: 449
How can I count or read only the actual entries of a column, as distinct from non-zero entries?
In other words, if I have the file:
4000,1,5221,0
4001,0,5222,1
4002,3,,,
column 4 has 2 actual entries, whereof one vanishes. I can count entries like so:
R = csvread("bugtest.csv");
for i = 1:4
VanishingColEntries(i) = numel (find (R(:,i) ==0));
NonVanishingColEntries(i) = nnz(R(:,i));
endfor
VanishingColEntries
NonVanishingColEntries
yielding:
octave:2> nument
VanishingColEntries =
0 1 1 2
NonVanishingColEntries =
3 2 2 1
But, I dont know how to extract the number of "actual" entries, that is the sum of non zero and explicitly zero entries!
Upvotes: 1
Views: 128
Reputation: 22245
csvread
is only for numeric data. If csvread
encounters an entry which is not strictly numeric, it checks if the string starts with a number, and uses that as the result (e.g. 1direction, 2pac, 7up
will result in 1,2,7
). 'Empty' entries here are effectively considered to be an empty string, which is parsed as the number 0. However, there are some special strings, like nan
and inf
which are parsed specially.
If you can / are happy to preprocess your csv file, then you can replace all empty entries with the string nan
(without quotes). csvread
will then treat this string specially and replace it with an actual nan
value in the resulting numerical matrix. You can then use this with isnan
to count the number of nan / non-nan entries as follows:
R = csvread( 'bugtest.csv' );
% Count nan / non-nan entries along rows
VanishingColEntries = sum( isnan( R ), 1 )
NonVanishingColEntries = sum( ~isnan( R ), 1 )
If you do not have the luxury of preprocessing your csv file (or you simply want to process it programmatically throughout, without the need for human intervention), then you can use the csv2cell
function from the io
package instead, and process the resulting cell to get what you want, e.g.
pkg load io
C = csv2cell( 'bugtest.csv' )
% Convert cells with empty strings to nan
for i = 1 : numel(C), if ischar(C{i}), C{i} = nan; endif, endfor
% Convert numeric cell array (nan is a valid number) to a matrix
R = cell2mat( C );
You can then use isnan
as before to get your result.
Upvotes: 3