Reputation: 55
I am attempting to read a plain text, tab delimited file format. I need to read both the strings and the numerical values. The issue is that the table doesn't start until the third row, the first two rows contain version information and info on the size of the data. When I try to use normal methods like load and import data I end up with an error message saying that the columns in line 2 do not match the columns in line 1.
I have written some code to read the file line by line which I will attach. I need to figure out how to make a structure with 4 fields
This is my code
fid = fopen('gene_expr_500x204.gct');
s = struct[]
% read the first 2 lines and do nothing to them
for k=1:3
tline = fgets(fid);
end
% next reading will be on 3rd line.
% display from 3rd line forward:
while ischar(tline)
disp(tline)
tline = fgets(fid);
end
fclose(fid)
Any help would be much appreciated, thank you in advance!
Upvotes: 3
Views: 206
Reputation: 1492
Use the following method
fileID = fopen('gene_expr_500x204.gct','r+');
C = textscan(fileID,'%s%s%s%s') %assuming you have 4 columns
and you can seperate the data by using
numericaldata = str2double(C{1}(3:end))
string data = C{1}(1:2)
Well assuming the number of columns are unknown just use
delimiter = '\t'
fid = fopen('testtext3.txt','rt');
tLines = fgets(fid);
numCols = numel(strfind(tLines,delimiter)) + 1;
formatSpec = repmat(['%s'],1,numCols )
or if the number of columns are known just
KnownColumns = 206;
formatSpec = repmat(['%s'],1,KnownColumns)
UPDATE: About your second question , actually you can store any data type to a structure field, I have given the method below a = {[1 2 3 ],'CO'}
a =
[1x3 double] 'CO'
b = table([1 2 3].','VariableNames',{'Heading'})
b =
Heading
_______
1
2
3
c = [1 2 3;4 5 6]
c =
1 2 3
4 5 6
Struc(1).DataTypes = a
Struc(2).DataTypes = b
Struc(3).DataTypes = c
struct2table(Struc)
ans =
DataTypes
____________
{1x2 cell }
[3x1 table ]
[2x3 double]
Upvotes: 1
Reputation: 65430
You can use textscan
to parse this file format. Using the file format, we can read in how many rows and columns are expected. We can then read the headers and place these into a cell array. Then we can create a custom format spec for each of the remaining rows and read in the rest of the file. Once we have that, we can combine the headers with the data to construct a struct
with fields that match the headers.
This solution is flexible as it actually parses the file format itself to determine the number of columns rather than hard-coding a particular value in.
fid = fopen('filename.txt', 'r');
% Skip the first line and determine the number or rows and number of samples
dims = textscan(fid, '%d', 2, 'HeaderLines', 1);
ncols = dims{1}(2);
% Now read the variable names
varnames = textscan(fid, '%s', 2 + ncols);
varnames = varnames{1};
% Now create the format spec for your data (2 strings and the rest floats)
spec = ['%s%s', repmat('%f', [1 ncols])];
% Read in all of the data using this custom format specifier. The delimiter will be a tab
data = textscan(fid, spec, 'Delimiter', '\t');
% Place the data into a struct where the variable names are the fieldnames
inputs = cat(1, varnames(:)', data);
S = struct(inputs{:});
% 7x1 struct array with fields:
%
% Name
% Desc
% A2
% B2
% C2
% D2
% E2
% F2
% G2
% H2
Upvotes: 2