f10w
f10w

Reputation: 1586

Working with massive numbers and minute numbers simultaneously

I have an mxn matrix A, with very small or very large elements. For example:

A = -1e4*randn(10,20);

I would like to create a new matrix C of the same size, as follows:

First, define a matrix B whose elements are the exponential of the elements of A:

B = exp(A)

Then, the matrix C is defined such that each column of C is proportional to the corresponding column of B and the sum of each column of C is equal to 1. In other words, if we take an element of B and divide it by the sum of all the elements of the same columns, then we obtain the corresponding element of C:

C = bsxfun(@rdivide,B,sum(B));

Mathematically:

enter image description here

Clearly, all the elements of C are between 0 and 1. However, when computing using the following code, the obtained matrix contains many NaNs:

A = -1e4*randn(10,20);

B = exp(A);

C = bsxfun(@rdivide,B,sum(B));

sum(sum(isnan(ee)))

I hope somebody can suggest a way to overcome this issue. Thank you very much in advance.

Update: Please note that the goal is the matrix C. I defined the matrix B just for explanation purpose, and we may not have to compute it. (We shouldn't actually. As @EJG89 pointed out that B contains only Inf and 0).

Update 2: Thanks to @EJG89 for the link to the Log Sum of Exponentials technique, which might be useful. I'm working on finding similar analytical tricks for my problem.

Update 3: As suggested by @Dan and @EJG89, we can subtract each column with a constant to obtain a new matrix within a reasonable range. Clearly, we have

enter image description here

for any constant C. We can choose C as the maximum value of each column:

enter image description here

(a_{max,j} is the maximum of the j-th column), then

enter image description here

I feel that this choice might give a very good approximation, but I don't know how good it is :|

The new code is:

A = bsxfun(@minus,A,max(A));
B = exp(A);
C = bsxfun(@rdivide,B,sum(B));

Upvotes: 3

Views: 136

Answers (4)

EJG89
EJG89

Reputation: 1189

Direct method is impossible.

A = -1e4*randn(10,20);
B = exp(A);

The problem is that B already yields Inf or 0.

Yielding division impossible returning Nan's.

ndoublemax = realmax
nsinglemax = realmax('single')

ndoublemin = realmin
nsinglemin = realmin('single')

yielding:

ndoublemax =

1.7977e+308

nsinglemax =

3.4028e+038

ndoublemin =

2.2251e-308

nsinglemin =

1.1755e-038

So your values for B clearly exceed those values.

An approximation can be made by rewriting the exponentials using this blog post

So first rewrite to: log(Cij) = log (exp(Aij)/sumk(exp(A(kj))))

log(Cij) = Aij - log( sumk(exp(A(kj)))

In which the last term can be approximated by: m = maxk(A(kj))

log(sumk(exp(A(kj))) = m + log(exp(A(kj)-m))

This results in smaller values able to be handled by MatLab, i.e. a normalization so to speak. Still no exact answer though because some values become zero removing their sometimes significant contribution.

Upvotes: 0

Dan
Dan

Reputation: 45741

You want to adjust A to some new A' such that eA = CeA' where C is a constant (that is much less than 1). In other words you're looking for some k such that k.eA is small enough not to breach ndoublemax or eps. But we want this k to be applied to A so we need to get it into the exponent. k = eln(k) and eln(k).eA = eA + ln(k) therefore if we add or subtract a number from A, eA is proportionally effected. Or A' = A + ln(k)

unfortunately I think that your range is simply too large for any ln(k) to stop you from breaching the limits on Matlab doubles. If you add a big number you'll get all of B equal to inf and if you subtract a big number you'll get all of B equal to zero.

So you need to find some way to work with both massive number and minute numbers simultaneously.

Upvotes: 3

Luis Mendo
Luis Mendo

Reputation: 112659

One approach could be to resort to symbolic computation:

A = -1e4*randn(10,20);
S = sym(A,'d'); %// convert to symbolic
C = exp(S)./repmat(sum(exp(S)),size(S,1),1);

The result C is given in symbolic form. For example, the first column will be something like

>> C(:,1)
ans =
   2.9347694526167482187885232843876*10^(-790)
  1.548257079197982942994953632259*10^(-10497)
  8.4257350773369612716806306890481*10^(-6390)
  3.1937456016712092927990884814437*10^(-7937)
  4.0377045095817442881784227420794*10^(-4076)
                                           1.0
  3.3704875581881026436375125643672*10^(-6482)
 8.9221015470879963245101895439354*10^(-12246)
 9.4968643813486260650483647051531*10^(-11252)
 1.8777491443104052522832960625121*10^(-11084)

But note that even with symbolic computation the largest term is given as 1, when actually it is (slightly) smaller. Of course, if you then convert to double you only get 0 or 1 values:

>> Cd = eval(C);
>> Cd(:,1)
ans =
     0
     0
     0
     0
     0
     1
     0
     0
     0
     0

Upvotes: 0

tashuhka
tashuhka

Reputation: 5126

Following the discussion of @EJG89, you can substitute infinite values with the maximum representable number, and then add eps to all the number in the denominator. In this way, you avoid the Inf.

A = -1e4*randn(10,20);

B = exp(A);
B(isinf(B)) = realmax;

C = bsxfun(@rdivide,B,sum(B)+eps);

sum(isnan(C(:)))

Upvotes: 0

Related Questions