Reputation: 71

Generating an empty matrix of known dimensions

I want to generate a matrix of known dimension in Octave. The problem is that I do not want to initialize the matrix with zeros. The matrix will only contain 0 or 1 but the elements (cells) which do not get allocated any value, must remain blank. Plan to use such a matrix in 'Collaborative Filtering' algo.

I am new to both Ocatve and 'Collaborative Filtering' algos. Have tried to look for the solution on the net but to no avail. Keywords empty matrix on net refers to arrays with zero dimensions or char matrix with " " as values.

Upvotes: 2

Answers (2)

carandraug

Reputation: 13091

The problem is that I do not want to initialize the matrix with zeros. The matrix will only contain 0 or 1 but the elements (cells) which do not get allocated any value, must remain blank.

You are wrong. You may think the matrix only contains values 0 or 1 but actually it has a value of 0, 1, or unset. You can't have a blank value, you always need some value. Or at least not blank the way you are thinking. Taking it a very low level, all bites need to have a value (0 or 1), they can't be blank. Therefore, if you want a blank value you need to interpret some value as blank.

Your data will then have 3 states: true, false, and blank. You will so need at least 2 bits per point (note that even logical/bool data types, which only need 1 bit, actually take up 8 bits (1 byte)).

Using NaN

This may look like the simples solution but it's actually pretty bad. It will be a huge waste of memory (and you will have very large matrices if you're doing collaborative filtering).

The reason is that if you use NaN, your data needs to be of type single or double. That's at least 32 or 64 bits respectively. Remember that you only actually need 2 bits. Of course, you could make your own data type that does have a NaN value.

octave> vals = NaN (3, 3) # 3x3 matrix of type double (default)
vals =

   NaN   NaN   NaN
   NaN   NaN   NaN
   NaN   NaN   NaN

octave> vals = NaN (3, 3, "single") # 3x3 matrix of type single
vals =

   NaN   NaN   NaN
   NaN   NaN   NaN
   NaN   NaN   NaN

Using a cell matrix

A cell array is a data type where each cell can be any Octave value. This includes another cell array, a matrix of any dimensions, or even an empty array. You could use an empty array as blank, but this will be terribly inefficient, both for memory and speed, and you won't be able to use most functions since they will work on numeric arrays, not cell arrays.

octave> vals = cell (3, 3); # create 3x3 cell matrix
octave> vals{2,3} = true; # set value
octave> vals{2,3} = false; # set value
octave> vals{2,3} = []; # unset value
octave> cumsum (vals)
error: cumsum: wrong type argument 'cell'
octave> nnz (vals)
error: nnz: wrong type argument 'cell array'
octave> find (vals)
error: find: wrong type argument 'cell'

Using 8 bit integer

This is what I see being used most often. Using signed 8 bit, you can use 0 for blank, -1 for false, and 1 for true (or whatever makes the most sense for you).

octave> vals = zeros (3, 3, "int8");

Using a separate matrix to track blank values

If you really really want to have a matrix of 0 and 1, then you need a separate matrix to keep track of which values have been set. In such case, both matrices can be of type logical, therefore each taking up 8 bit per data points, which totals at 16 bit per data point. It also has the problem that you need to keep the two matrices in sync.

octave> vals = false (3, 3);
octave> set_vals = false (size (vals));

Making your own class

Either using the new classdef (will require Octave 4.0.0) of the old @class type, you can encapsulate any of the strategy above (I would personally use an 8 bit integer) on its own class. This moves the logic of knowing which value (-1 or 0) means blank if you use a signed 8 bit. Or if you prefer to use a separate matrix for blank values, then move the logic of keeping the values in sync to a setter method.

Upvotes: 1

Suever

Reputation: 65430

A numeric array cannot hold empty values. Typically in this case, people will use NaN as a placeholder value.

%// Initialize a 3D matrix of NaN values
data = nan(2, 3, 4);

size(data)
%//  2   3   4

It is then easy to differentiate a place holder value from real data. You can detect them using isnan.

The only way to create an array of empty values (and it is highly discouraged due to the performance hit) is to use cell arrays.

data = cell(2, 3, 4);

Upvotes: 4