aliocee
aliocee

Reputation: 730

Loading text file in MATLAB?

I have a comma separated file with 182 rows and 501 columns, of which 500 columns are of type number (features) while the last column are strings (labels).

Example: 182x501 dimension

1,3,4,6,.........7, ABC
4,5,6,4,.........9, XYZ
3,4,5,3,.........2, ABC 

How can I load this file so it will have a data set with a matrix, B, containing the number as my features, and a vector, C, containing the strings as my labels?

d = dataset(B, C);

Upvotes: 3

Views: 6426

Answers (3)

Andrew Janke
Andrew Janke

Reputation: 23908

Build a format specifier for textscan based on the number and types of columns, and have it read the file for you.

nNumberCols = 500;
format = [repmat('%f,', [1 nNumberCols]) '%s'];
fid = fopen(file);
x = textscan(fid, format);
fclose(fid);
B = cat(2, x{1:nNumberCols});
C = x{end};

Upvotes: 4

anon
anon

Reputation:

For example, if you have the following data in a file named data.txt:

1,3,4,6,7, ABC
4,5,6,4,9, XYZ
3,4,5,3,2, ABC 

you can read it into a matrix B and a cell array C using the code

N = 5; % Number of numeric data to read
fid = fopen('data.txt');
B = []; C = {};
while ~feof(fid)  % repeat until end of file is reached
  b = fscanf(fid, '%f,', N); % read N numeric data separated by a comma
  c = fscanf(fid, '%s', 1);  % read a string
  B = [B, b];
  C = [C, c];
end
C
B
fclose(fid);

to give

C = 
  'ABC'    'XYZ'    'ABC'
B =
 1     4     3
 3     5     4
 4     6     5
 6     4     3
 7     9     2

Upvotes: 3

Chris
Chris

Reputation: 46366

You could use the textscan function. For example:

fid = fopen('test.dat');

% Read numbers and string into a cell array
data = textscan(fid, '%s %s');

% Then extract the numbers and strings into their own cell arrays
nums = data{1};
str  = data{2};

% Convert string of numbers to numbers
for i = 1:length(str)
    nums{i} = str2num(nums{i}); %#ok<ST2NM>
end

% Finally, convert cell array of numbers to a matrix
nums = cell2mat(nums);

fclose(fid);

Note that I have made a number of assumptions here, based on the file format you have specified. For example, I assume that there are no spaces after the commas following a number, but that there is a space immediately preceding the string at the end of each line.

To can make the above code more flexible by using a more considered format specifier (the second argument to textscan). See the section Basic Conversion Specifiers in the textscan documentation.

Upvotes: 3

Related Questions