Ludi
Ludi

Reputation: 449

Count or read only actual entries

How can I count or read only the actual entries of a column, as distinct from non-zero entries?

In other words, if I have the file:

4000,1,5221,0
4001,0,5222,1
4002,3,,,

column 4 has 2 actual entries, whereof one vanishes. I can count entries like so:

R = csvread("bugtest.csv");

for i = 1:4
VanishingColEntries(i) = numel (find (R(:,i) ==0));
NonVanishingColEntries(i) = nnz(R(:,i));
endfor
VanishingColEntries
NonVanishingColEntries 

yielding:

octave:2> nument
VanishingColEntries =

   0   1   1   2

NonVanishingColEntries =

   3   2   2   1

But, I dont know how to extract the number of "actual" entries, that is the sum of non zero and explicitly zero entries!

Upvotes: 1

Views: 128

Answers (1)

Tasos Papastylianou
Tasos Papastylianou

Reputation: 22245

csvread is only for numeric data. If csvread encounters an entry which is not strictly numeric, it checks if the string starts with a number, and uses that as the result (e.g. 1direction, 2pac, 7up will result in 1,2,7 ). 'Empty' entries here are effectively considered to be an empty string, which is parsed as the number 0. However, there are some special strings, like nan and inf which are parsed specially.

If you can / are happy to preprocess your csv file, then you can replace all empty entries with the string nan (without quotes). csvread will then treat this string specially and replace it with an actual nan value in the resulting numerical matrix. You can then use this with isnan to count the number of nan / non-nan entries as follows:

  R = csvread( 'bugtest.csv' );

% Count nan / non-nan entries along rows
  VanishingColEntries    = sum(  isnan( R ), 1 )
  NonVanishingColEntries = sum( ~isnan( R ), 1 )

If you do not have the luxury of preprocessing your csv file (or you simply want to process it programmatically throughout, without the need for human intervention), then you can use the csv2cell function from the io package instead, and process the resulting cell to get what you want, e.g.

  pkg load io
  C = csv2cell( 'bugtest.csv' )

% Convert cells with empty strings to nan
  for i = 1 : numel(C), if ischar(C{i}), C{i} = nan; endif, endfor

% Convert numeric cell array (nan is a valid number) to a matrix
  R = cell2mat( C );

You can then use isnan as before to get your result.

Upvotes: 3

Related Questions