Reputation: 71
I want to generate a matrix of known dimension in Octave. The problem is that I do not want to initialize the matrix with zeros. The matrix will only contain 0 or 1 but the elements (cells) which do not get allocated any value, must remain blank. Plan to use such a matrix in 'Collaborative Filtering' algo.
I am new to both Ocatve and 'Collaborative Filtering' algos. Have tried to look for the solution on the net but to no avail. Keywords empty matrix on net refers to arrays with zero dimensions or char matrix with " " as values.
Upvotes: 2
Views: 6736
Reputation: 13091
The problem is that I do not want to initialize the matrix with zeros. The matrix will only contain 0 or 1 but the elements (cells) which do not get allocated any value, must remain blank.
You are wrong. You may think the matrix only contains values 0 or 1 but actually it has a value of 0, 1, or unset. You can't have a blank value, you always need some value. Or at least not blank the way you are thinking. Taking it a very low level, all bites need to have a value (0 or 1), they can't be blank. Therefore, if you want a blank value you need to interpret some value as blank.
Your data will then have 3 states: true, false, and blank. You will so need at least 2 bits per point (note that even logical/bool data types, which only need 1 bit, actually take up 8 bits (1 byte)).
This may look like the simples solution but it's actually pretty bad. It will be a huge waste of memory (and you will have very large matrices if you're doing collaborative filtering).
The reason is that if you use NaN
, your data needs to be of type single or double. That's at least 32 or 64 bits respectively. Remember that you only actually need 2 bits. Of course, you could make your own data type that does have a NaN
value.
octave> vals = NaN (3, 3) # 3x3 matrix of type double (default)
vals =
NaN NaN NaN
NaN NaN NaN
NaN NaN NaN
octave> vals = NaN (3, 3, "single") # 3x3 matrix of type single
vals =
NaN NaN NaN
NaN NaN NaN
NaN NaN NaN
A cell array is a data type where each cell can be any Octave value. This includes another cell array, a matrix of any dimensions, or even an empty array. You could use an empty array as blank, but this will be terribly inefficient, both for memory and speed, and you won't be able to use most functions since they will work on numeric arrays, not cell arrays.
octave> vals = cell (3, 3); # create 3x3 cell matrix
octave> vals{2,3} = true; # set value
octave> vals{2,3} = false; # set value
octave> vals{2,3} = []; # unset value
octave> cumsum (vals)
error: cumsum: wrong type argument 'cell'
octave> nnz (vals)
error: nnz: wrong type argument 'cell array'
octave> find (vals)
error: find: wrong type argument 'cell'
This is what I see being used most often. Using signed 8 bit, you can use 0 for blank, -1 for false, and 1 for true (or whatever makes the most sense for you).
octave> vals = zeros (3, 3, "int8");
If you really really want to have a matrix of 0 and 1, then you need a separate matrix to keep track of which values have been set. In such case, both matrices can be of type logical, therefore each taking up 8 bit per data points, which totals at 16 bit per data point. It also has the problem that you need to keep the two matrices in sync.
octave> vals = false (3, 3);
octave> set_vals = false (size (vals));
Either using the new classdef (will require Octave 4.0.0) of the old @class type, you can encapsulate any of the strategy above (I would personally use an 8 bit integer) on its own class. This moves the logic of knowing which value (-1 or 0) means blank if you use a signed 8 bit. Or if you prefer to use a separate matrix for blank values, then move the logic of keeping the values in sync to a setter method.
Upvotes: 1
Reputation: 65430
A numeric array cannot hold empty values. Typically in this case, people will use NaN
as a placeholder value.
%// Initialize a 3D matrix of NaN values
data = nan(2, 3, 4);
size(data)
%// 2 3 4
It is then easy to differentiate a place holder value from real data. You can detect them using isnan
.
The only way to create an array of empty values (and it is highly discouraged due to the performance hit) is to use cell arrays.
data = cell(2, 3, 4);
Upvotes: 4