Reputation: 730
I have a comma separated file with 182 rows and 501 columns, of which 500 columns are of type number (features) while the last column are strings (labels).
Example: 182x501 dimension
1,3,4,6,.........7, ABC
4,5,6,4,.........9, XYZ
3,4,5,3,.........2, ABC
How can I load this file so it will have a data set with a matrix, B
, containing the number as my features, and a vector, C
, containing the strings as my labels?
d = dataset(B, C);
Upvotes: 3
Views: 6426
Reputation: 23908
Build a format specifier for textscan based on the number and types of columns, and have it read the file for you.
nNumberCols = 500;
format = [repmat('%f,', [1 nNumberCols]) '%s'];
fid = fopen(file);
x = textscan(fid, format);
fclose(fid);
B = cat(2, x{1:nNumberCols});
C = x{end};
Upvotes: 4
Reputation:
For example, if you have the following data in a file named data.txt
:
1,3,4,6,7, ABC
4,5,6,4,9, XYZ
3,4,5,3,2, ABC
you can read it into a matrix B
and a cell array C
using the code
N = 5; % Number of numeric data to read
fid = fopen('data.txt');
B = []; C = {};
while ~feof(fid) % repeat until end of file is reached
b = fscanf(fid, '%f,', N); % read N numeric data separated by a comma
c = fscanf(fid, '%s', 1); % read a string
B = [B, b];
C = [C, c];
end
C
B
fclose(fid);
to give
C =
'ABC' 'XYZ' 'ABC'
B =
1 4 3
3 5 4
4 6 5
6 4 3
7 9 2
Upvotes: 3
Reputation: 46366
You could use the textscan function. For example:
fid = fopen('test.dat');
% Read numbers and string into a cell array
data = textscan(fid, '%s %s');
% Then extract the numbers and strings into their own cell arrays
nums = data{1};
str = data{2};
% Convert string of numbers to numbers
for i = 1:length(str)
nums{i} = str2num(nums{i}); %#ok<ST2NM>
end
% Finally, convert cell array of numbers to a matrix
nums = cell2mat(nums);
fclose(fid);
Note that I have made a number of assumptions here, based on the file format you have specified. For example, I assume that there are no spaces after the commas following a number, but that there is a space immediately preceding the string at the end of each line.
To can make the above code more flexible by using a more considered format specifier (the second argument to textscan
). See the section Basic Conversion Specifiers
in the textscan
documentation.
Upvotes: 3