Reputation: 11
I am trying to read the following data into MATLAB:
'0.000000 1 18EFFA59x Rx D 8 AD 09 02 00 00 00 00 30'
'0.004245 1 14EFF01Cx Rx D 6 DB 00 FF FF 00 71'
'0.004640 1 CEF801Cx Rx D 3 3F 00 3B'
'0.005130 1 14EF131Cx Rx D 6 DB 00 FF FF 00 71'
'0.005630 1 CEF801Cx Rx D 3 3F 00 C3'
'0.010015 1 18EFFA59x Rx D 8 AD 07 01 00 00 00 00 30'
'0.014145 1 CF004F0x Rx D 8 F0 FF 7D 00 00 FF FF FF'
'0.015060 1 18EFFA59x Rx D 8 AD 07 02 00 00 00 00 30'
'0.018235 1 18EF1CF0x Rx D 8 F2 1E 05 FF FF 00 71 FF'
'0.018845 1 18EA5941x Rx D 3 09 FF 00'
I can easily read in each line as a string - but to make post-processing more efficient I'd like to separate each line by its delimiter - which is whitespace. In other words, the end result should be a non-singleton cell array. I can't seem to find a very efficient way of doing this. Efficiency is important because these files are several million lines long and processing in MATLAB with strings/cells takes a long time.
Any help would be appreciated. Thanks.
Upvotes: 0
Views: 218
Reputation: 3914
You appear to have fixed-width fields, so I would treat it as such and let textscan
do the most of the pre-processing for you by turning off delimiters and whitespace and defining the field widths and types explicitly:
test = {...
'0.000000 1 18EFFA59x Rx D 8 AD 09 02 00 00 00 00 30'
'0.004245 1 14EFF01Cx Rx D 6 DB 00 FF FF 00 71'
'0.004640 1 CEF801Cx Rx D 3 3F 00 3B'
'0.005130 1 14EF131Cx Rx D 6 DB 00 FF FF 00 71'
'0.005630 1 CEF801Cx Rx D 3 3F 00 C3'
'0.010015 1 18EFFA59x Rx D 8 AD 07 01 00 00 00 00 30'
'0.014145 1 CF004F0x Rx D 8 F0 FF 7D 00 00 FF FF FF'
'0.015060 1 18EFFA59x Rx D 8 AD 07 02 00 00 00 00 30'
'0.018235 1 18EF1CF0x Rx D 8 F2 1E 05 FF FF 00 71 FF'
'0.018845 1 18EA5941x Rx D 3 09 FF 00'};
test = strjoin(test', '\n');
C = textscan(test, '%8.6f %2u %11s %4s %2s %2u %33s', 'delimiter', '','whitespace','');
col1 = C{1};
col2 = C{2};
col3 = strtrim(C{3});
col3 = cellfun(@(x)hex2dec(x(1:end-1)), col3); % for instance.
col4 = strtrim(C{4});
col5 = strtrim(C{5});
col6 = C{6};
col7 = strtrim(C{7});
In the real world, you'd substitute the text string for a file id. For the last variable-length field, just read the whole thing in, making sure you specify the maximum possible length. MATLAB will read a field until it gets to the end or reaches a newline character (in fact, I made the last field width 1 larger, just to make sure). Each field is then aggregated into a cell. I also took the liberty of converting the third field from hex to decimal to show how you might post-process the numbers further.
As a further note, if you really do have gigantic files and need maximum speed, you could skip the strtrim
step on the character fields by specifying %*ns
where n is the desired field width, for any known gaps such as the 2 character gap between columns 3 and 4. The star says to ignore that field. I find this way of doing things a bit more readable and intuitive, however, and leaves a small margin of error in case one of the fields, such as the 4th, occasionally has a 3 character entry.
Upvotes: 0