Collin
Collin

Reputation: 264

Use MATLAB to extract data beyond "Data starts on next line:" in text-file

I'm trying to extract 4 columns of data from a .txt file using Matlab (using Matlab is non-negotiable in this case); however, a variable amount of header text precedes the data of interest. The line just above the data always reads

Theta(deg) Phi(deg) Amp Phase Data starts on next line:

For more context, the transition from header to data looks like this...:

Amp/Phase drift   =  -1.11  dB,  2.7  deg


Theta(deg)  Phi(deg)    Amp     Phase   Data starts on next line:
 -180.000   -90.000    16.842  -116.986
 -179.000   -90.000    16.837  -126.651
 -178.000   -90.000    16.549  -137.274

What is the best approach? Also, is there a method that might save time by only searching the first, say, 200 lines of text for the phrase Data starts on next line:?

Upvotes: 2

Views: 139

Answers (2)

barceloco
barceloco

Reputation: 458

If you can modify the text file, just comment the non-data part of your text file by adding a % in front. Then you can simply load the file into matlab.

Concretely: If your file data.txt contains

% Amp/Phase drift   =  -1.11  dB,  2.7  deg
%
%
% Theta(deg)  Phi(deg)    Amp     Phase   Data starts on next line:
 -180.000   -90.000    16.842  -116.986
 -179.000   -90.000    16.837  -126.651
 -178.000   -90.000    16.549  -137.274

Then matlab will be able to handle

data=load('data.txt');

and the content of this variable will be

>> data

data =

 -180.0000  -90.0000   16.8420 -116.9860
 -179.0000  -90.0000   16.8370 -126.6510
 -178.0000  -90.0000   16.5490 -137.2740

Upvotes: 0

rayryeng
rayryeng

Reputation: 104474

You can always open up the file and loop through the file until you find Data starts on the next line:. Once you're there, you can read in those values into a matrix. You can use a combination of fopen, strfind, fgetl, textscan, cell2mat and fclose to help you do that.

Something like this:

f = fopen('data.txt', 'r'); %// Replace filename with whatever you're looking at

%// Go through each line in the text file until we find "Data starts on next line"
line = fgetl(f);
while isempty(strfind(line, 'Data starts on next line'))
    if line == -1 %// If we reach the end of the file, get out
        break;
    end
    line = fgetl(f);
end

%// File pointer is now advanced to this point.  Grab the data
if line ~= -1
    data = cell2mat(textscan(f, '%f %f %f %f'));
else
    disp('Could not find data to parse');
end

fclose(f); %// Close file

The code speaks for itself. However, to be verbose, let's go through it line by line.

The first line opens up your data file for reading. Next, we grab the first line of the text file, then keep checking from that point onwards until we find an instance of 'Data starts on next line' on that line. We put this logic in a while loop and strfind determines the locations of where a particular pattern happens in some text. The text we're searching in is the queried line in the text file, and the pattern we want is 'Data starts on next line'. If we don't find what we're looking for, strfind returns an empty array, so we are looping with a while loop until strfind doesn't return an empty array.

I've placed some additional checks where if we don't find 'Data starts on next line' we don't do anything. If we reach the end of the file, fgetl will return -1. If we encounter a -1, that means there is no data to be parsed, and so we'll just leave things the way they are.

If we do end up finding this string, the file pointer has advanced to the point where there is now valid numerical data. We use textscan to read in the lines of text past this point and using the fact that there are four columns of data, we use %f separated by spaces to denote that there are 4 floating point numbers per line. The result of this will give you a 4 element cell array where each clement is a column of data. To convert the results to a numerical array, you'll need to use cell2mat do this conversion. This data is stored in a variable called data. We finally close the file as we don't need to use it anymore.

When I run the above code and place your sample text data into a file called data.txt, this is what I get:

>> data

data =

 -180.0000  -90.0000   16.8420 -116.9860
 -179.0000  -90.0000   16.8370 -126.6510
 -178.0000  -90.0000   16.5490 -137.2740

Upvotes: 2

Related Questions