Reputation: 21
Hi all and thanks in advance. This is my first post here, please let me know if I should do this differently.
I have a large textfile containing lines like the following:
"DATE TIMESTAMP T W M T AL M C A_B_C"
At first I read this in using the fopen and fget1 commands, so that I get a string:
Readout = DATE TIMESTAMP T W M T AL M C A_B_C
I want to transform this via e.g. textscan. While I feel I know matlab I am by no means expert with this command and have trouble using it.
I want to get:
A = 'Date' 'TIMESTAMP' 'T W M' 'T AL M C' 'A_B_C'
However using the following code:
A = textscan(Readout,'%s');
A = A{1}';
I get:
A = 'DATE' 'TIMESTAMP' 'T' 'W' 'M' 'T' 'AL' 'M' 'C' 'A_B_C'
As I asked in the title, is there a way to ignore the single spaces?
PS: At the end of writing this I just came up with a not very elegent solution I would still like to know if there is any nicer solution, however:
ReadBetter = [];
for n = 1:length(Read)-1
if Read(n) == ' ' & Read(n+1) ~= ' '
else
ReadBetter = [ReadBetter Read(n)];
end
end
ReadBetter = [ReadBetter Read(n+1)];
Read
ReadBetter
Output:
Read =
DATE TIMESTAMP T W M T AL M C A_B_C
ReadBetter =
DATE TIMESTAMP TWM TALMC A_B_C
Now I can use ReadBetter with textscan.
Thanks for this awesome webpage and the help I found here, in many other posts
Upvotes: 2
Views: 12269
Reputation: 3244
Newer versions of matlab have a 'split' option for regexp similar to perl's split.
>> str = 'DATE TIMESTAMP T W M T AL M C A_B_C';
>> out = regexp(str, ' +', 'split')
out =
'DATE' 'TIMESTAMP' 'T W M' 'T AL M C' 'A_B_C'
Upvotes: 2
Reputation: 124563
Here's one way to read your file:
DATE TIMESTAMP T W M T AL M C A_B_C
DATE TIMESTAMP T W M T AL M C A_B_C
DATE TIMESTAMP T W M T AL M C A_B_C
DATE TIMESTAMP T W M T AL M C A_B_C
DATE TIMESTAMP T W M T AL M C A_B_C
DATE TIMESTAMP T W M T AL M C A_B_C
fid = fopen('file.dat', 'rt');
C = textscan(fid, '%s %s %c%c%c %c%2c%c%c %s');
fclose(fid);
C = [ C{1}, C{2}, ...
cellstr( strcat(C{3},{' '},C{4},{' '},C{5}) ), ...
cellstr( strcat(C{6},{' '},C{7},{' '},C{8},{' '},C{9}) ), ...
C{10}
]
The resulting cell-array:
C =
'DATE' 'TIMESTAMP' 'T W M' 'T AL M C' 'A_B_C'
'DATE' 'TIMESTAMP' 'T W M' 'T AL M C' 'A_B_C'
'DATE' 'TIMESTAMP' 'T W M' 'T AL M C' 'A_B_C'
'DATE' 'TIMESTAMP' 'T W M' 'T AL M C' 'A_B_C'
'DATE' 'TIMESTAMP' 'T W M' 'T AL M C' 'A_B_C'
'DATE' 'TIMESTAMP' 'T W M' 'T AL M C' 'A_B_C'
Upvotes: 0
Reputation: 125864
A simpler solution to parse your string would be to use the function REGEXP to find the indices where you have 2 or more whitespace characters in a row, use these indices to break your string up into a cell array of strings using the function MAT2CELL, then use the function STRTRIM to remove leading and trailing whitespace from each substring. For example:
>> str = 'DATE TIMESTAMP T W M T AL M C A_B_C';
>> cutPoints = regexp(str,'\s{2,}');
>> cellArr = mat2cell(str,1,diff([0 cutPoints numel(str)]));
>> cellArr = strtrim(cellArr)
cellArr =
'DATE' 'TIMESTAMP' 'T W M' 'T AL M C' 'A_B_C'
Upvotes: 1
Reputation: 757
I think that you are making things too complicated. Just use:
fid = fopen('pathandnameoffile');
textscan(fid,'%s','Delimiter','\t');
The example above assumes that you have tabs as delimiters. Change it to something else if required.
Upvotes: 0