Gerry Harp
Gerry Harp

Reputation: 159

Octave strread can't return parsed results to an array (?)

In Octave, I am reading very large text files from disk and parsing them. The function textread() does just what I want except for the way it is implemented. Looking at the source, textread.m pulls the entire text file into memory before attempting to parse lines. If the text file is large, it fills all my free RAM (16 GB) with text and then starts saving back to disk (virtual memory), before parsing. If I wait long enough, textread() will complete, but it takes almost forever.

Notice that after parsing into a matrix of floating point values, the same data fit into memory quite easily. So I'm using textread() in an intermediate zone, where there is enough memory for the floats, but not enough memory for the same data as text.

All of that is preparation for my question, which is about strread(). The data in my text files looks like this

0.0647148      -2.0072535       0.5644875       8.6954257
0.1294296      -8.4689583       0.6567095       144.3090450
0.1941444      -9.2658037      -1.0228742       173.8027785
0.2588593      -6.5483359      -1.5767574       90.7337329
0.3235741      -0.7646807      -0.5320896       1.7357120

... and so on. There are no header lines or comments in the file.

I wrote a function that reads the file line by line, and notice the two ways I'm attempting to use strread() to parse a line of data.

function dest = readPowerSpectrumFile(filename, dest)

  % read enough lines to fill destination array
  [rows, cols] = size(dest);

  fid = fopen(filename, 'r');

  for line = 1 : rows
    lstr = fgetl(fid);

% this line works, but is very brittle
    [dest(line, 1), dest(line, 2), dest(line, 3), dest(line, 4)]  = strread(lstr, "%f %f %f %f");

% This line doesn't work. Or anything similar I can think of.
%    dest(line, 1:4) = strread(lstr, "%f %f %f %f");

  endfor

  fclose(fid);

endfunction

Is there an elegant way of having strread return parsed values to an array? Otherwise I'll have to write a new function any time I change the number of columns.

Thanks

Upvotes: 0

Views: 98

Answers (2)

Andy
Andy

Reputation: 8091

Your described format is a matrix with floating point values. In this case you can just use load

d = load ("yourfile");

which is much faster than any other function. You can have a look at the used implementation in libinterp/corefcn/ls-mat-ascii.cc: read_mat_ascii_data

Upvotes: 1

beaker
beaker

Reputation: 16801

If you feed fprintf more values than are in its format specification, it will reapply the print statement until it's used them up:

>> fprintf("%d %d \n", 1:6)
1 2
3 4
5 6

It appears this also works with strread. If you specify only one value to read, but there are multiple on the current line, it will keep reading them and add them to a column vector. All we need to do is to assign those values to the correct row of dest:

function dest = readPowerSpectrumFile(filename, dest)

   % read enough lines to fill destination array
   [rows, cols] = size(dest);

   fid = fopen(filename, 'r');

   for line = 1 : rows
      lstr = fgetl(fid);

      % read all values from current line into column vector 
      % and store values into row of dest
      dest(line,:) = strread(lstr, "%f");
      % this will also work since values are assumed to be numeric by default:
      % dest(line,:) = strread(lstr);
   endfor

   fclose(fid);

endfunction

Output:

readPowerSpectrumFile(filename, zeros(5,4))
ans =

   6.4715e-02  -2.0073e+00   5.6449e-01   8.6954e+00
   1.2943e-01  -8.4690e+00   6.5671e-01   1.4431e+02
   1.9414e-01  -9.2658e+00  -1.0229e+00   1.7380e+02
   2.5886e-01  -6.5483e+00  -1.5768e+00   9.0734e+01
   3.2357e-01  -7.6468e-01  -5.3209e-01   1.7357e+00

Upvotes: 1

Related Questions