John Deutsch
John Deutsch

Reputation: 11

Read ASC file into MATLAB using textscan - variable column lengths

I am trying to read the following data into MATLAB:

'0.000000 1  18EFFA59x  Rx D 8  AD  09  02  00  00  00  00  30'  
'0.004245 1  14EFF01Cx  Rx D 6  DB  00  FF  FF  00  71'  
'0.004640 1  CEF801Cx   Rx D 3  3F  00  3B'  
'0.005130 1  14EF131Cx  Rx D 6  DB  00  FF  FF  00  71'  
'0.005630 1  CEF801Cx   Rx D 3  3F  00  C3'  
'0.010015 1  18EFFA59x  Rx D 8  AD  07  01  00  00  00  00  30'  
'0.014145 1  CF004F0x   Rx D 8  F0  FF  7D  00  00  FF  FF  FF'  
'0.015060 1  18EFFA59x  Rx D 8  AD  07  02  00  00  00  00  30'  
'0.018235 1  18EF1CF0x  Rx D 8  F2  1E  05  FF  FF  00  71  FF'  
'0.018845 1  18EA5941x  Rx D 3  09  FF  00'  

I can easily read in each line as a string - but to make post-processing more efficient I'd like to separate each line by its delimiter - which is whitespace. In other words, the end result should be a non-singleton cell array. I can't seem to find a very efficient way of doing this. Efficiency is important because these files are several million lines long and processing in MATLAB with strings/cells takes a long time.

Any help would be appreciated. Thanks.

Upvotes: 0

Views: 218

Answers (1)

craigim
craigim

Reputation: 3914

You appear to have fixed-width fields, so I would treat it as such and let textscan do the most of the pre-processing for you by turning off delimiters and whitespace and defining the field widths and types explicitly:

test = {...
    '0.000000 1  18EFFA59x  Rx D 8  AD  09  02  00  00  00  00  30'
    '0.004245 1  14EFF01Cx  Rx D 6  DB  00  FF  FF  00  71'
    '0.004640 1  CEF801Cx   Rx D 3  3F  00  3B'
    '0.005130 1  14EF131Cx  Rx D 6  DB  00  FF  FF  00  71'
    '0.005630 1  CEF801Cx   Rx D 3  3F  00  C3'
    '0.010015 1  18EFFA59x  Rx D 8  AD  07  01  00  00  00  00  30'
    '0.014145 1  CF004F0x   Rx D 8  F0  FF  7D  00  00  FF  FF  FF'
    '0.015060 1  18EFFA59x  Rx D 8  AD  07  02  00  00  00  00  30'
    '0.018235 1  18EF1CF0x  Rx D 8  F2  1E  05  FF  FF  00  71  FF'
    '0.018845 1  18EA5941x  Rx D 3  09  FF  00'};

test = strjoin(test', '\n');

C = textscan(test, '%8.6f %2u %11s %4s %2s %2u %33s', 'delimiter', '','whitespace','');

col1 = C{1};
col2 = C{2};
col3 = strtrim(C{3});
col3 = cellfun(@(x)hex2dec(x(1:end-1)), col3); % for instance.
col4 = strtrim(C{4});
col5 = strtrim(C{5});
col6 = C{6};
col7 = strtrim(C{7});

In the real world, you'd substitute the text string for a file id. For the last variable-length field, just read the whole thing in, making sure you specify the maximum possible length. MATLAB will read a field until it gets to the end or reaches a newline character (in fact, I made the last field width 1 larger, just to make sure). Each field is then aggregated into a cell. I also took the liberty of converting the third field from hex to decimal to show how you might post-process the numbers further.

As a further note, if you really do have gigantic files and need maximum speed, you could skip the strtrim step on the character fields by specifying %*ns where n is the desired field width, for any known gaps such as the 2 character gap between columns 3 and 4. The star says to ignore that field. I find this way of doing things a bit more readable and intuitive, however, and leaves a small margin of error in case one of the fields, such as the 4th, occasionally has a 3 character entry.

Upvotes: 0

Related Questions