Physicist
Physicist

Reputation: 3048

extracting numeric data from text in data files in Matlab

I have a .txt data file which has a few rows of text comments in the beginning, followed by the columns of actual data. It looks something like this:

lens (mm): 150
Power (uW): 24.4
Inner circle: 56x56
Outer Square: 256x320
remarks: this run looks good            
2.450000E+1 6.802972E+7 1.086084E+6 1.055582E-5 1.012060E+0 1.036552E+0
2.400000E+1 6.866599E+7 1.088730E+6 1.055617E-5 1.021491E+0 1.039043E+0
2.350000E+1 6.858724E+7 1.086425E+6 1.055993E-5 1.019957E+0 1.036474E+0
2.300000E+1 6.848760E+7 1.084434E+6 1.056495E-5 1.017992E+0 1.034084E+0

By using importdata, Matlab automatically separates the text data and the actual data . But how do I extract those numeric data from the text (which is stored in cells format)? What I want to do to achieve:

  1. extract those numbers (e.g. 150, 24.4)
  2. If possible, extract the names ('lens', 'Power')
  3. If possible, extract the units ('mm', 'uW')

1 is the most important and 2 or 3 is optional. I am also happy to change the format of the text comments if that simplifies the codes.

Upvotes: 1

Views: 212

Answers (1)

Dev-iL
Dev-iL

Reputation: 24169

Let's say your sample data is saved as demo.txt, you can do the following:

function q47203382
%% Reading from file:
COMMENT_ROWS = 5;
% Read info rows:
fid = fopen('demo.txt','r'); % open for reading
txt = textscan(fid,'%s',COMMENT_ROWS,'delimiter', '\n'); txt = txt{1};
fclose(fid);
% Read data rows:
numData = dlmread('demo.txt',' ',COMMENT_ROWS,0);
%% Processing:
desc = cell(5,1);
unit = cell(2,1);
quant = cell(5,1);
for ind1 = 1:numel(txt)
  if ind1 <= 2
    [desc{ind1}, unit{ind1}, quant{ind1}] = readWithUnit(txt{ind1});
  else
    [desc{ind1},             quant{ind1}] = readWOUnit(txt{ind1});
  end
end
%% Display:
disp(desc);
disp(unit);
disp(quant);
disp(mat2str(numData));
end

function [desc, unit, quant] = readWithUnit(str)
  tmp = strsplit(str,{' ','(',')',':'});
  [desc, unit, quant] = tmp{:};
end

function [desc, quant] = readWOUnit(str)
  tmp = strtrim(strsplit(str,': '));   
  [desc, quant] = tmp{:};
end

We read the data in two stages: textscan for the comment rows in the beginning, and dlmread for the following numeric data. Then, it's a matter of splitting the text in order to obtain the various bits of information.

Here's the output of the above:

>> q47203382
    'lens'
    'Power'
    'Inner circle'
    'Outer Square'
    'remarks'

    'mm'
    'uW'

    '150'
    '24.4'
    '56x56'
    '256x320'
    'this run looks good'

    [24.5 68029720 1086084 1.055582e-05 1.01206  1.036552;
     24   68665990 1088730 1.055617e-05 1.021491 1.039043;
     23.5 68587240 1086425 1.055993e-05 1.019957 1.036474;
     23   68487600 1084434 1.056495e-05 1.017992 1.034084]

(I took the liberty to format the output a bit for easier viewing.)

See also: str2double.

Upvotes: 1

Related Questions