Reputation: 1271
I have a 9-column tab-delimited .txt file containing numerous data-formats - some entries are however empty within 'type
'.
id id_2 s1 s2 st1 st2 type desig num
1 1 51371 51434 52858 52939 5:3_4:4_6:2_4:4_2:6 CO 1
2 1 108814 108928 109735 110856 5:3_4:4_6:2_4:4_2:7 CO 2
3 1 130975 131303 131303 132066 5:3_4:4_6:2_4:4_2:8 NCO 3
4 1 191704 191755 194625 194803 NCO 4
5 2 69355 69616 69901 70006 CO 5
6 2 202580 202724 204536 205151 5:3_4:4_6:2_4:4 CO 6
Due to the mixed format types, i've been using textscan to import this data:
data = textscan(fid1, '%*f %f %f %f %f %f %*s %s %*[^\r\n]','HeaderLines',1);
To take columns 2-6, skip 'type
' and take the 8th column.
This approach fails on rows with empty entries - it skips this as if it was not a column and instead of taking 'NCO' or 'CO' it will take '4' or '5'.
Is there a way to prevent this? I know I could alter the original .txt files to include something like 'NA' for empty entries but this is less desirable than a more robust way to read such files.
EDIT:
In addition to the answer below, simply specifying the delimiter used appears to fix the issue:
data = textscan(fid1, '%*f %f %f %f %f %f %*s %s %*[^\r\n]','HeaderLines',1,'delimiter','\t');
Upvotes: 1
Views: 50
Reputation: 221514
Here's one approach with importdata
and strsplit
-
%// Read in data with importdata
data = importdata('data1.txt') %// 'data1.txt' is the input text file
%// Split data
split_data = cellfun(@(x) strsplit(x,' '),data,'Uni',0)
N = numel(split_data) %// number of rows in input textfile
%// Setup output cell and mask arrays
out_cell = cell(9,N)
mask = true(9,N)
%// Set the "type" entry as zero in mask array for the rows in textfile
%// that has corresponding entry missing
mask(7,cellfun(@length,split_data)~=9)=0
%// Use mask to set cells in out_cell from split data entries
out_cell(mask) = [split_data{:}]
out = out_cell'
Sample run -
>> type data1.txt
id id_2 s1 s2 st1 st2 type desig num
1 1 51371 51434 52858 52939 5:3_4:4_6:2_4:4_2:6 CO 1
2 1 108814 108928 109735 110856 5:3_4:4_6:2_4:4_2:7 CO 2
3 1 130975 131303 131303 132066 5:3_4:4_6:2_4:4_2:8 NCO 3
4 1 191704 191755 194625 194803 NCO 4
5 2 69355 69616 69901 70006 CO 5
6 2 202580 202724 204536 205151 5:3_4:4_6:2_4:4 CO 6
>> out
out =
'id' 'id_2' 's1' 's2' 'st1' 'st2' 'type' 'desig' 'num'
'1' '1' '51371' '51434' '52858' '52939' '5:3_4:4_6:2_4:4_2:6' 'CO' '1'
'2' '1' '108814' '108928' '109735' '110856' '5:3_4:4_6:2_4:4_2:7' 'CO' '2'
'3' '1' '130975' '131303' '131303' '132066' '5:3_4:4_6:2_4:4_2:8' 'NCO' '3'
'4' '1' '191704' '191755' '194625' '194803' [] 'NCO' '4'
'5' '2' '69355' '69616' '69901' '70006' [] 'CO' '5'
'6' '2' '202580' '202724' '204536' '205151' '5:3_4:4_6:2_4:4' 'CO' '6'
Upvotes: 1