I am trying to read a tab delimited text file and store certain parts of the data in different fields in a matlab structure

Question

I am attempting to read a plain text, tab delimited file format. I need to read both the strings and the numerical values. The issue is that the table doesn't start until the third row, the first two rows contain version information and info on the size of the data. When I try to use normal methods like load and import data I end up with an error message saying that the columns in line 2 do not match the columns in line 1.

I have written some code to read the file line by line which I will attach. I need to figure out how to make a structure with 4 fields

This is my code

fid = fopen('gene_expr_500x204.gct');
s = struct[]
% read the first 2 lines and do nothing to them
for k=1:3
    tline = fgets(fid);
end
% next reading will be on 3rd line.
% display from 3rd line forward:
while ischar(tline)
    disp(tline)
    tline = fgets(fid);
end
fclose(fid)

Any help would be much appreciated, thank you in advance!

Suever · Accepted Answer

You can use textscan to parse this file format. Using the file format, we can read in how many rows and columns are expected. We can then read the headers and place these into a cell array. Then we can create a custom format spec for each of the remaining rows and read in the rest of the file. Once we have that, we can combine the headers with the data to construct a struct with fields that match the headers.

This solution is flexible as it actually parses the file format itself to determine the number of columns rather than hard-coding a particular value in.

fid = fopen('filename.txt', 'r');

% Skip the first line and determine the number or rows and number of samples
dims = textscan(fid, '%d', 2, 'HeaderLines', 1);
ncols = dims{1}(2);

% Now read the variable names
varnames = textscan(fid, '%s', 2 + ncols);
varnames = varnames{1};

% Now create the format spec for your data (2 strings and the rest floats)
spec = ['%s%s', repmat('%f', [1 ncols])];

% Read in all of the data using this custom format specifier. The delimiter will be a tab
data = textscan(fid, spec, 'Delimiter', '	');

% Place the data into a struct where the variable names are the fieldnames
inputs = cat(1, varnames(:)', data);
S = struct(inputs{:});

%   7x1 struct array with fields:
%
%   Name
%   Desc
%   A2
%   B2
%   C2
%   D2
%   E2
%   F2
%   G2
%   H2

I am trying to read a tab delimited text file and store certain parts of the data in different fields in a matlab structure

Answers (2)

Related Questions