Birk Birk
Birk Birk

Reputation: 21

Matlab: How can I read in a string separated with spaces but ignore single spaces (using textscan)?

Hi all and thanks in advance. This is my first post here, please let me know if I should do this differently.

I have a large textfile containing lines like the following:

"DATE      TIMESTAMP    T W M     T AL M C  A_B_C"

At first I read this in using the fopen and fget1 commands, so that I get a string:

Readout = DATE      TIMESTAMP    T W M     T AL M C A_B_C

I want to transform this via e.g. textscan. While I feel I know matlab I am by no means expert with this command and have trouble using it.

I want to get:

A = 'Date' 'TIMESTAMP' 'T W M' 'T AL M C' 'A_B_C'

However using the following code:

 A = textscan(Readout,'%s');
 A = A{1}';

I get:

A = 'DATE'    'TIMESTAMP'    'T'    'W'    'M'    'T'    'AL'    'M'    'C'    'A_B_C'

As I asked in the title, is there a way to ignore the single spaces?

PS: At the end of writing this I just came up with a not very elegent solution I would still like to know if there is any nicer solution, however:

ReadBetter = [];
for n = 1:length(Read)-1
if Read(n) == ' ' & Read(n+1) ~= ' '
else
    ReadBetter = [ReadBetter Read(n)];
end
end
ReadBetter = [ReadBetter Read(n+1)];
Read   
ReadBetter

Output:
Read =

DATE      TIMESTAMP    T W M     T AL M C   A_B_C

ReadBetter =

DATE     TIMESTAMP   TWM    TALMC   A_B_C

Now I can use ReadBetter with textscan.

Thanks for this awesome webpage and the help I found here, in many other posts

Upvotes: 2

Views: 12269

Answers (4)

Rich C
Rich C

Reputation: 3244

Newer versions of matlab have a 'split' option for regexp similar to perl's split.

>> str = 'DATE      TIMESTAMP    T W M     T AL M C  A_B_C';
>> out = regexp(str, '  +', 'split')

out = 

    'DATE'    'TIMESTAMP'    'T W M'    'T AL M C'    'A_B_C'

Upvotes: 2

Amro
Amro

Reputation: 124563

Here's one way to read your file:

file.dat

DATE      TIMESTAMP    T W M     T AL M C  A_B_C
DATE      TIMESTAMP    T W M     T AL M C  A_B_C
DATE      TIMESTAMP    T W M     T AL M C  A_B_C
DATE      TIMESTAMP    T W M     T AL M C  A_B_C
DATE      TIMESTAMP    T W M     T AL M C  A_B_C
DATE      TIMESTAMP    T W M     T AL M C  A_B_C

MATLAB code:

fid = fopen('file.dat', 'rt');
C = textscan(fid, '%s %s %c%c%c %c%2c%c%c %s');
fclose(fid);
C = [ C{1}, C{2}, ...
    cellstr( strcat(C{3},{' '},C{4},{' '},C{5}) ), ...
    cellstr( strcat(C{6},{' '},C{7},{' '},C{8},{' '},C{9}) ), ...
    C{10}
]

The resulting cell-array:

C = 
    'DATE'    'TIMESTAMP'    'T W M'    'T AL M C'    'A_B_C'
    'DATE'    'TIMESTAMP'    'T W M'    'T AL M C'    'A_B_C'
    'DATE'    'TIMESTAMP'    'T W M'    'T AL M C'    'A_B_C'
    'DATE'    'TIMESTAMP'    'T W M'    'T AL M C'    'A_B_C'
    'DATE'    'TIMESTAMP'    'T W M'    'T AL M C'    'A_B_C'
    'DATE'    'TIMESTAMP'    'T W M'    'T AL M C'    'A_B_C'

Upvotes: 0

gnovice
gnovice

Reputation: 125864

A simpler solution to parse your string would be to use the function REGEXP to find the indices where you have 2 or more whitespace characters in a row, use these indices to break your string up into a cell array of strings using the function MAT2CELL, then use the function STRTRIM to remove leading and trailing whitespace from each substring. For example:

>> str = 'DATE      TIMESTAMP    T W M     T AL M C  A_B_C';
>> cutPoints = regexp(str,'\s{2,}');
>> cellArr = mat2cell(str,1,diff([0 cutPoints numel(str)]));
>> cellArr = strtrim(cellArr)

cellArr = 

    'DATE'    'TIMESTAMP'    'T W M'    'T AL M C'    'A_B_C'

Upvotes: 1

hgus1294
hgus1294

Reputation: 757

I think that you are making things too complicated. Just use:

fid = fopen('pathandnameoffile');
textscan(fid,'%s','Delimiter','\t');

The example above assumes that you have tabs as delimiters. Change it to something else if required.

Upvotes: 0

Related Questions