Fifth-Edition
Fifth-Edition

Reputation: 365

Reading data into MATLAB from a textfile

I have a textfile with the following structure:

1999-01-04
1,100.00
1,060.00
1,092.50
0
6,225
1,336,605
37
1999-01-05 
1,122.50
1,087.50
1,122.50
0
3,250
712,175
14
...

The file contains repeated sets of eight values (a date followed by seven numbers, each on their own line).

I want to read it into MATLAB and get the values into different vectors. I've tried to accomplish this with several different methods, but none have worked - all output some sort of error.

In case it's important, I'm doing this on a Mac.

Upvotes: 4

Views: 14307

Answers (6)

Kirk Ireson
Kirk Ireson

Reputation:

This has regular expression checking to make sure your data is formatted well.

fid = fopen('data.txt','rt');

%these will be your 8 value arrays
val1 = [];
val2 = [];
val3 = [];
val4 = [];
val5 = [];
val6 = [];
val7 = [];
val8 = [];

linenum = 0; % line number in file
valnum = 0; % number of value (1-8)

while 1
   line = fgetl(fid);
   linenum = linenum+1;
   if valnum == 8
      valnum = 1;
   else
      valnum = valnum+1;
   end

    %-- if reached end of file, end
    if isempty(line) | line == -1
      fclose(fid);
      break;
   end


   switch valnum
      case 1
         pat = '(?\d{4})-(?\d{2})-(?\d{2})'; % val1 (e.g. 1999-01-04)
      case 2
         pat = '(?\d*[,]*\d*[,]*\d*[.]\d{2})'; % val2 (e.g. 1,100.00)  [valid up to 1billion-1]
      case 3
         pat = '(?\d*[,]*\d*[,]*\d*[.]\d{2})'; % val3 (e.g. 1,060.00)  [valid up to 1billion-1]
      case 4
         pat = '(?\d*[,]*\d*[,]*\d*[.]\d{2})'; % val4 (e.g. 1,092.50)  [valid up to 1billion-1]
      case 5
         pat = '(?\d+)'; % val5 (e.g. 0)
      case 6
         pat = '(?\d*[,]*\d*[,]*\d+)'; % val6 (e.g. 6,225)  [valid up to 1billion-1]
      case 7
         pat = '(?\d*[,]*\d*[,]*\d+)'; % val7 (e.g. 1,336,605)  [valid up to 1billion-1]
      case 8
         pat = '(?\d+)'; % val8 (e.g. 37)
      otherwise
         error('bad linenum')
   end

   l = regexp(line,pat,'names'); % l is for line
    if length(l) == 1 % match
      if valnum == 1
         serialtime = datenum(str2num(l.yr),str2num(l.mo),str2num(l.dy)); % convert to matlab serial date
         val1 = [val1;serialtime];
      else
         this_val = strrep(l.val,',',''); % strip out comma and convert to number
         eval(['val',num2str(valnum),' = [val',num2str(valnum),';',this_val,'];']) % save this value into appropriate array
      end
   else
      warning(['line number ',num2str(linenum),' skipped! [didnt pass regexp]: ',line]);
   end
end

Upvotes: 0

gnovice
gnovice

Reputation: 125854

EDIT: This is a shorter version of the code I previously had in my answer...

If you'd like to read your data file directly, without having to preprocess it first as dstibbe suggested, the following should work:

fid = fopen('datafile.txt','rt');
data = textscan(fid,'%s %s %s %s %s %s %s %s','Delimiter','\n');
fclose(fid);
data = [datenum(data{1}) cellfun(@str2double,[data{2:end}])]';

The above code places each set of 8 values into an 8-by-N matrix, with N being the number of 8 line sets in the data file. The date is converted to a serial date number so that it can be included with the other double-precision values in the matrix. The following functions (used in the above code) may be of interest: TEXTSCAN, DATENUM, CELLFUN, STR2DOUBLE.

Upvotes: 9

Amro
Amro

Reputation: 124563

I propose yet another solution. This one is the shortest in MATLAB code. First using sed, we format the file as a CSV file (comma seperated, with each record on one line):

cat a.dat | sed -e 's/,//g ; s/[ \t]*$/,/g' -e '0~8 s/^\(.*\),$/\1\n/' | 
            sed -e :a -e '/,$/N; s/,\n/,/; ta' -e '/^$/d' > file.csv

Explanation: First we get rid of the thousands comma seperator, and trim spaces at the end of each line adding a comma. But then we remove that ending comma for each 8th line. Finally we join the lines and remove empty ones.

The output will look like this:

1999-01-04,1100.00,1060.00,1092.50,0,6225,1336605,37
1999-01-05,1122.50,1087.50,1122.50,0,3250,712175,14

Next in MATLAB, we simply use textscan to read each line, with the first field as a string (to be converted to num), and the rest as numbers:

fid = fopen('file.csv', 'rt');
a = textscan(fid, '%s %f %f %f %f %f %f %f', 'Delimiter',',', 'CollectOutput',1);
fclose(fid);

M = [datenum(a{1}) a{2}]

and the resulting matrix M is:

  730124     1100     1060   1092.5    0   6225   1336605    37
  730125   1122.5   1087.5   1122.5    0   3250    712175    14

Upvotes: 4

Todd
Todd

Reputation: 2401

Similar to Richie's. Using str2double to convert the file strings to doubles. This implementation processes line by line instead of breaking the file up with a regular expression. The output is a cell array of individual vectors.

function vectors = readdata(filename)

fid=fopen(filename);

tline = fgetl(fid);
counter = 0;
vectors = cell(7,1);
while ischar(tline)
    disp(tline)
    if counter > 0
        vectors{counter} = [vectors{counter} str2double(tline)];
    end
    counter = counter + 1
    if counter > 7
        counter = 0;
    end
    tline = fgetl(fid);
end

fclose(fid);

Upvotes: 0

Richie Cotton
Richie Cotton

Reputation: 121077

It isn't entirely clear what form you want the data to be in once you've read it. The code below puts it all in one matrix, with each row representing a group of 8 rows in your text file. You may wish use different variables for different columns, or (if you have access to the Statistics toolbox), use a dataset array.

% Read file as text
text = fileread('c:/data.txt');

% Split by line
x = regexp(text, '\n', 'split');

% Remove commas from numbers
x = regexprep(x, ',', '')

% Number of items per object
n = 8;

% Get dates
index = 1:length(x);
dates = datenum(x(rem(index, n) == 1));

% Get other numbers
nums = str2double(x(rem(index, n) ~= 1));
nums = reshape(nums, (n-1), length(nums)/(n-1))';

% Combine dates and numbers
thedata = [dates nums];

You could also look into the function textscan for alternative ways of solving the problem.

Upvotes: 2

dstibbe
dstibbe

Reputation: 1697

Use a script to modify your text file into something that Matlab can read.

eg. make it a matrix:

M = [
1999-01-04
1,100.00
1,060.00
1,092.50
0
6,225
1,336,605;  <-- notice the ';'
37
1999-01-05 
1,122.50
1,087.50
1,122.50
0
3,250;   <-- notice the ';'
712,175
14
...
]

import this into matlab and read the various vectors from the matrix.

Note: my matlab is a bit rusty. Might containt errors.

Upvotes: 3

Related Questions