jstluise
jstluise

Reputation: 13

Extracting data from .txt in Matlab - Tried importdata()

I have .txt files that contain the output from a data logger. The data is logged in a very specific way, with one message per line, and with a message ID in front of each message. So, by knowing the message ID of each line, I will know what will follow on that line due to the message format. Each message (different ID) has a different format. An example of the data:

$GPGGA,220542.000,4745.8026,N,12211.0284,W,1,07,1.3,3.4,M,-17.2,M,,0000*67
$GPGSA,A,3,30,05,29,31,02,10,25,,,,,,2.0,1.3,1.5*3D
$GPGSV,3,1,12,29,78,315,39,05,52,080,45,30,43,288,46,25,41,196,33*72
$GPGSV,3,2,12,21,31,249,30,02,23,066,41,12,17,172,38,31,11,276,40*7F
$GPGSV,3,3,12,10,07,036,30,26,02,115,,18,01,199,,48,34,194,37*73
$GPRMC,220542.000,A,4745.8026,N,12211.0284,W,0.08,174.78,271011,,*12
$GPGGA,220543.000,4745.8025,N,12211.0284,W,1,07,1.3,3.4,M,-17.2,M,,0000*65

Everything is separated by commas, but since each line is different (ie each message does not have the same format), I can't do anything with csv (in Matlab).

Basically, what I want to do is search through the data, line by line. For each line, I want to determine the message ID, then store the remainder of the line in an array, since I will know the format of the line. In the end, I will have an array for each message type.

I can put everything into excel using csv, but since each line is different it is hard to extract the data...and I don't even know if that is possible in excel (it probably is).

In Matlab, I can't use anything csv because there are non-numerical values. I have tried reading the file directly and grabbing each line using fgetl(), then stepping through each line, but there has to be a more efficient way. I read something about saving an excel file, then going to matlab to extract the data, but it would be nice to eliminate that intermediate step.

Ideas about how to go about this would be nice. I'm not looking for anyone to write code for me...just to point me in the write direction.

Oh, and I thought importdata() would work, but I tried importdata('filename.txt',','), but it would not recognize the delimiter...?

Upvotes: 1

Views: 2357

Answers (1)

Amro
Amro

Reputation: 124563

Since your data has mixed formats, at some point in the process, you will have to loop over the lines and parse them according to your their respective formats. I doubt you will be able to parse the whole thing in one easy function call...

Here is my attempt at reading the file as you gave it (you didn't describe the format of the different log messages, so I had to came up with my own based on the few lines shown).

The idea is to have for each message type, the format of the corresponding lines in the log file. We start by reading all lines from the file, and extract the message ID from each. Next for each possible message ID, we extract the matching lines, parse them one-by-one using the specified format and store the extracted info.

The result will be a cell array, with one cell for each message ID. Each of those elements will be a cellarray on its own, storing as a table the rows/columns read.

%# line format for each type of message IDs
frmt = {
    '$GPGGA', '%s %f %f %c %f %c %f %f %f %f %c %f %c %f %s' ; 
    '$GPGSA', '%s %c %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %s' ; 
    '$GPGSV', '%s %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %s' ; 
    '$GPRMC', '%s %f %c %f %c %f %c %f %f %f %f %s' ; 
};

%# read log file as lines
fid = fopen('log.txt','rt');
C = textscan(fid, '%s', Inf, 'Delimiter','\n'); C = C{1};
fclose(fid);

%# get message ID of each line
msgId = strtok(C,',');

%# for each possible message ID
arr = cell(size(frmt,1),1);
for m=1:size(frmt,1)
    %# get lines matching this ID
    lines = C( ismember(msgId,frmt{m,1}) );

    %# parse lines using specified format
    arr{m} = cell(numel(lines), sum(frmt{m,2}=='%'));
    for i=1:numel(lines)
        arr{m}(i,:) = textscan(lines{i}, frmt{m,2}, 'Delimiter',',');
    end

    %# flatten nested cells containing strings
    idx = cellfun(@iscell, arr{m}(1,:));
    arr{m}(:,idx) = cellfun(@(x)x, arr{m}(:,idx));
end

log.txt

$GPGGA,220542.000,4745.8026,N,12211.0284,W,1,07,1.3,3.4,M,-17.2,M,,0000*67
$GPGSA,A,3,30,05,29,31,02,10,25,,,,,,2.0,1.3,1.5*3D
$GPGSV,3,1,12,29,78,315,39,05,52,080,45,30,43,288,46,25,41,196,33*72
$GPGSV,3,2,12,21,31,249,30,02,23,066,41,12,17,172,38,31,11,276,40*7F
$GPGSV,3,3,12,10,07,036,30,26,02,115,,18,01,199,,48,34,194,37*73
$GPRMC,220542.000,A,4745.8026,N,12211.0284,W,0.08,174.78,271011,,*12
$GPGGA,220543.000,4745.8025,N,12211.0284,W,1,07,1.3,3.4,M,-17.2,M,,0000*65

The result for the above file:

>> arr
arr = 
    {2x15 cell}
    {1x18 cell}
    {3x20 cell}
    {1x12 cell}

for example, the rows matching messageID = $GPGGA are:

>> arr{ ismember(frmt(:,1),'$GPGGA') }     %# arr{1}
ans = 
    '$GPGGA'    [220542]    [4745.8]    'N'    [12211]    'W'    [1]    [7]    [1.3]    [3.4]    'M'    [-17.2]    'M'    [NaN]    '0000*67'
    '$GPGGA'    [220543]    [4745.8]    'N'    [12211]    'W'    [1]    [7]    [1.3]    [3.4]    'M'    [-17.2]    'M'    [NaN]    '0000*65'

Upvotes: 1

Related Questions