embryo3699
embryo3699

Reputation: 55

I am trying to read a tab delimited text file and store certain parts of the data in different fields in a matlab structure

I am attempting to read a plain text, tab delimited file format. I need to read both the strings and the numerical values. The issue is that the table doesn't start until the third row, the first two rows contain version information and info on the size of the data. When I try to use normal methods like load and import data I end up with an error message saying that the columns in line 2 do not match the columns in line 1.

I have written some code to read the file line by line which I will attach. I need to figure out how to make a structure with 4 fields

This is my code

fid = fopen('gene_expr_500x204.gct');
s = struct[]
% read the first 2 lines and do nothing to them
for k=1:3
    tline = fgets(fid);
end
% next reading will be on 3rd line.
% display from 3rd line forward:
while ischar(tline)
    disp(tline)
    tline = fgets(fid);
end
fclose(fid)

Any help would be much appreciated, thank you in advance!

Upvotes: 3

Views: 206

Answers (2)

Novice_Developer
Novice_Developer

Reputation: 1492

Use the following method

fileID = fopen('gene_expr_500x204.gct','r+');
C = textscan(fileID,'%s%s%s%s') %assuming you have 4 columns 

and you can seperate the data by using

numericaldata = str2double(C{1}(3:end))
string data = C{1}(1:2)

Well assuming the number of columns are unknown just use

 delimiter = '\t'
 fid = fopen('testtext3.txt','rt');
 tLines = fgets(fid);
 numCols = numel(strfind(tLines,delimiter)) + 1;
 formatSpec = repmat(['%s'],1,numCols )

or if the number of columns are known just

KnownColumns = 206;
formatSpec = repmat(['%s'],1,KnownColumns)

UPDATE: About your second question , actually you can store any data type to a structure field, I have given the method below a = {[1 2 3 ],'CO'}

a = 

    [1x3 double]    'CO'

b = table([1 2 3].','VariableNames',{'Heading'})

b = 

    Heading
    _______

    1      
    2      
    3      

 c = [1 2 3;4 5 6]

c =

     1     2     3
     4     5     6

Struc(1).DataTypes = a

Struc(2).DataTypes = b

Struc(3).DataTypes = c

struct2table(Struc)

ans = 

     DataTypes  
    ____________

    {1x2 cell  }
    [3x1 table ]
    [2x3 double]

Upvotes: 1

Suever
Suever

Reputation: 65430

You can use textscan to parse this file format. Using the file format, we can read in how many rows and columns are expected. We can then read the headers and place these into a cell array. Then we can create a custom format spec for each of the remaining rows and read in the rest of the file. Once we have that, we can combine the headers with the data to construct a struct with fields that match the headers.

This solution is flexible as it actually parses the file format itself to determine the number of columns rather than hard-coding a particular value in.

fid = fopen('filename.txt', 'r');

% Skip the first line and determine the number or rows and number of samples
dims = textscan(fid, '%d', 2, 'HeaderLines', 1);
ncols = dims{1}(2);

% Now read the variable names
varnames = textscan(fid, '%s', 2 + ncols);
varnames = varnames{1};

% Now create the format spec for your data (2 strings and the rest floats)
spec = ['%s%s', repmat('%f', [1 ncols])];

% Read in all of the data using this custom format specifier. The delimiter will be a tab
data = textscan(fid, spec, 'Delimiter', '\t');

% Place the data into a struct where the variable names are the fieldnames
inputs = cat(1, varnames(:)', data);
S = struct(inputs{:});

%   7x1 struct array with fields:
%
%   Name
%   Desc
%   A2
%   B2
%   C2
%   D2
%   E2
%   F2
%   G2
%   H2

Upvotes: 2

Related Questions