AnnaSchumann
AnnaSchumann

Reputation: 1271

Accommodating blank entries in .txt files using textscan - MATLAB

I have a 9-column tab-delimited .txt file containing numerous data-formats - some entries are however empty within 'type'.

id  id_2 s1      s2      st1     st2          type         desig  num
1   1   51371   51434   52858   52939   5:3_4:4_6:2_4:4_2:6 CO     1
2   1   108814  108928  109735  110856  5:3_4:4_6:2_4:4_2:7 CO     2
3   1   130975  131303  131303  132066  5:3_4:4_6:2_4:4_2:8 NCO    3
4   1   191704  191755  194625  194803                      NCO    4
5   2   69355   69616   69901   70006                       CO     5
6   2   202580  202724  204536  205151  5:3_4:4_6:2_4:4     CO     6

Due to the mixed format types, i've been using textscan to import this data:

data = textscan(fid1, '%*f %f %f %f %f %f %*s %s %*[^\r\n]','HeaderLines',1);

To take columns 2-6, skip 'type' and take the 8th column.

This approach fails on rows with empty entries - it skips this as if it was not a column and instead of taking 'NCO' or 'CO' it will take '4' or '5'.

Is there a way to prevent this? I know I could alter the original .txt files to include something like 'NA' for empty entries but this is less desirable than a more robust way to read such files.

EDIT:

In addition to the answer below, simply specifying the delimiter used appears to fix the issue:

data = textscan(fid1, '%*f %f %f %f %f %f %*s %s %*[^\r\n]','HeaderLines',1,'delimiter','\t');

Upvotes: 1

Views: 50

Answers (1)

Divakar
Divakar

Reputation: 221514

Here's one approach with importdata and strsplit -

%// Read in data with importdata
data = importdata('data1.txt') %// 'data1.txt' is the input text file

%// Split data
split_data = cellfun(@(x) strsplit(x,' '),data,'Uni',0)

N = numel(split_data) %// number of rows in input textfile

%// Setup output cell and mask arrays
out_cell = cell(9,N)
mask = true(9,N)

%// Set the "type" entry as zero in mask array for the rows in textfile
%// that has corresponding entry missing
mask(7,cellfun(@length,split_data)~=9)=0

%// Use mask to set cells in out_cell from split data entries
out_cell(mask) = [split_data{:}]
out = out_cell'

Sample run -

>> type data1.txt

id  id_2 s1      s2      st1     st2          type         desig  num
1   1   51371   51434   52858   52939   5:3_4:4_6:2_4:4_2:6 CO     1
2   1   108814  108928  109735  110856  5:3_4:4_6:2_4:4_2:7 CO     2
3   1   130975  131303  131303  132066  5:3_4:4_6:2_4:4_2:8 NCO    3
4   1   191704  191755  194625  194803                      NCO    4
5   2   69355   69616   69901   70006                       CO     5
6   2   202580  202724  204536  205151  5:3_4:4_6:2_4:4     CO     6
>> out
out = 
    'id'    'id_2'    's1'        's2'        'st1'       'st2'       'type'                   'desig'    'num'
    '1'     '1'       '51371'     '51434'     '52858'     '52939'     '5:3_4:4_6:2_4:4_2:6'    'CO'       '1'  
    '2'     '1'       '108814'    '108928'    '109735'    '110856'    '5:3_4:4_6:2_4:4_2:7'    'CO'       '2'  
    '3'     '1'       '130975'    '131303'    '131303'    '132066'    '5:3_4:4_6:2_4:4_2:8'    'NCO'      '3'  
    '4'     '1'       '191704'    '191755'    '194625'    '194803'                       []    'NCO'      '4'  
    '5'     '2'       '69355'     '69616'     '69901'     '70006'                        []    'CO'       '5'  
    '6'     '2'       '202580'    '202724'    '204536'    '205151'    '5:3_4:4_6:2_4:4'        'CO'       '6' 

Upvotes: 1

Related Questions