How to read data using a custom delimiter

Question

Related to this question, I have this data sample saved in a .txt file:

'1458937887.70818 $GPGGA,200228.90,3555.3269,N,15552.9641,A*25'
'1458937887.709668 $GPVTG,56.740,T,56.740,M,
 0.069,N,0.127,K,D*2D'
'1458937887.712022 $GPGGA,200229.00,3555.3269,N,
 15552.9641,C*2B'
'1458937887.714071 $GPVTG,286.847,T,286.847,M,0.028,N,0.051,K,D*28'

I use the following to read the data:

textscan(fileID,'%s','Delimiter','
')

However, the is not what I want. I want to define another delimiter to be a letter (alphanumeric), followed by *, followed by two letters (alphanumeric), then a .

Edit: The main problem is that some data are saved into two lines. For example, line 2 and 3 above belongs to the same data packet.

rayryeng · Accepted Answer

One suggestion I have is to read the entire file as a single string. Then what you can do is remove the new lines that are placed in the file yourself. Once you do this, use regular expressions to insert new newlines after you find the desired pattern which is one alphanumeric character, followed by an asterisk * followed by two alphanumeric characters. Once we finally have that, use textscan with the Delimiter flag to separate out the strings by the new newline characters we have put in.

First use fread to read in data from a file. We can slightly abuse this command by reading an infinite amount of characters, which means it will read the entire file until the end. We also need to make sure that we specify that each discrete element in this file is a character. Once we do this, we search for any newline characters and remove them. If you are on Windows, not only does it introduce newlines but it also introduces carriage returns but the code I will write will be independent of that fact. We do need to know that the newline is ASCII code 10 and the carriage return is ASCII code 13. The output of fread will in fact be a double array where each element is the ASCII code of a character seen in the file. We will use logical indexing to remove these elements, then use regexprep to search for the desired pattern and insert newline characters ourselves. Once we do this, we finally throw this into textscan like how you've called it.

As such:

fileID = fopen('...'); %// Place filename here
str = fread(fileID, [1 inf], 'char'); %// Read in the string as one array

%// Remove newlines and carriage returns (if applicable)
str(str == 10 | str == 13) = [];

%// Search for the desired pattern and insert newlines after the pattern
out = regexprep(char(str), '\w\*\w{2}', '$0
');

%// Finally split up the strings
txt = textscan(out, '%s', 'Delimiter', '
');
txt = txt{1};

%// Close the file
fclose(fileID);

When we use regexprep, we search for an alphanumeric character \w, followed by an asterisk \* (the \ is important here because * is used in regex language to mean something else. To denote the actual character * in regex, you have to prepend with a \ character), followed by two alphanumeric characters \w{2}. The result will be these occurrences in your file removed from the strings all together. Another intricacy is that we must cast to char to convert the string that was originally a double type to char. Also, textscan's output in this case should give you a nested cell array of one element, so we unpack the cell by referencing the first cell. The desired output is in txt.

How to read data using a custom delimiter

Answers (1)

Related Questions