Berk U.
Berk U.

Reputation: 7188

N-Dimensional Histogram Counts

I am currently trying to code up a function to assign probabilities to a collection of vectors using a histogram count. This is essentially a counting exercise, but requires some finesse to be able to achieve efficiently. I will illustrate with an example:

Say that I have a matrix X = [x1, x2....xM] with N rows and M columns. Here, X represents a collection of M, N-dimensional vectors. IN other words, each of the columns of X is an N-dimensional vector.

As an example, we can generate such an X for M = 10000 vectors and N = 5 dimensions using:

X = randint(5,10000)

This will produce a 5 x 10000 matrix of 0s and 1s, where each column is represents a 5 dimensional vector of 1s and 0s.

I would like to assign a probability to each of these vectors through a basic histogram count. The steps are simple: first find the unique columns of X; second, count the number of times each unique column occurs. The probability of a particular occurrence is then the #of times this column was in X / total number of columns in X.

Returning to the example above, I can do the first step using the unique function in MATLAB as follows:

UniqueXs = unique(X','rows')' 

The code above will return UniqueXs, a matrix with N rows that only contains the unique columns of X. Note that the transposes are due to weird MATLAB input requirements.

However, I am unable to find a good way to count the number of times each of the columns in UniqueX is in X. So I'm wondering if anyone has any suggestions?

Broadly speaking, I can think of two ways of achieving the counting step. The first way would be to use the find function, though I think this may be slow since find is an elementwise operation. The second way would be to call unique recursively as it can also provide the index of one of the unique columns in X. This should allow us to remove that column from X and redo unique on the resulting X and keep counting.

Ideally, I think that unique might already be doing some counting so the most efficient way would probably be to work without the built-in functions.

Upvotes: 1

Views: 2491

Answers (1)

Amro
Amro

Reputation: 124563

Here are two solutions, one assumes all values are either 0's or 1's (just like the example in your description), the other does not. Both codes should be very fast (more so the one with binary values), even on large data.

1) only zeros and ones

%# random vectors of 0's and 1's
x = randi([0 1], [5 10000]);    %# RANDINT is deprecated, use RANDI instead

%# convert each column to a binary string
str = num2str(x', repmat('%d',[1 size(x,1)])); %'

%# convert binary representation to decimal number
num = (str-'0') * (2.^(size(s,2)-1:-1:0))';    %'# num = bin2dec(str);

%# count frequency of how many each number occurs
count = accumarray(num+1,1);                   %# num+1 since it starts at zero

%# assign probability based on count
prob = count(num+1)./sum(count);

2) any positive integer

%# random vectors with values 0:MAX_NUM
x = randi([0 999], [5 10000]);

%# format vectors as strings (zero-filled to a constant length)
nDigits = ceil(log10( max(x(:)) ));
frmt = repmat(['%0' num2str(nDigits) 'd'], [1 size(x,1)]);
str = cellstr(num2str(x',frmt));               %'

%# find unique strings, and convert them to group indices
[G,GN] = grp2idx(str);

%# count frequency of occurrence
count = accumarray(G,1);

%# assign probability based on count
prob = count(G)./sum(count);

Now we can see for example how many times each "unique vector" occurred:

>> table = sortrows([GN num2cell(count)])
table = 
    '000064850843749'    [1]       # original vector is: [0 64 850 843 749]
    '000130170550598'    [1]       # and so on..
    '000181606710020'    [1]
    '000220492735249'    [1]
    '000275871573376'    [1]
    '000525617682120'    [1]
    '000572482660558'    [1]
    '000601910301952'    [1]
    ...

Note that in my example with random data, the vector space becomes very sparse (as you increase the maximum possible value), thus I wouldn't be surprised if all counts were equal to 1...

Upvotes: 1

Related Questions