Alex
Alex

Reputation: 2531

C multiplication or addition floats results NaN

I'm using libresample at program. After some time (about 50 min) it crashes in lib's function lrsFilterUD() at one workstation.

float lrsFilterUD(float Imp[],  /* impulse response */
              float ImpD[], /* impulse response deltas */
              UWORD Nwing,  /* len of one wing of filter */
              BOOL Interp,  /* Interpolate coefs using deltas? */
              float *Xp,    /* Current sample */
              double Ph,    /* Phase */
              int Inc,    /* increment (1 for right wing or -1 for left) */
              double dhb)
{
   float a;
   float *Hp, *Hdp, *End;
   float v, t;
   double Ho;

   v = 0.0; /* The output value */
   Ho = Ph*dhb;
   End = &Imp[Nwing];
   if (Inc == 1)        /* If doing right wing...              */
   {                      /* ...drop extra coeff, so when Ph is  */
      End--;            /*    0.5, we don't do too many mult's */
      if (Ph == 0)      /* If the phase is zero...           */
         Ho += dhb;     /* ...then we've already skipped the */
   }                         /*    first sample, so we must also  */
                        /*    skip ahead in Imp[] and ImpD[] */

   if (Interp)
      while ((Hp = &Imp[(int)Ho]) < End) {
         t = *Hp;       /* Get IR sample */
         Hdp = &ImpD[(int)Ho];  /* get interp bits from diff table*/
         a = Ho - floor(Ho);      /* a is logically between 0 and 1 */
         t += (*Hdp)*a; /* t is now interp'd filter coeff */
         t *= *Xp;      /* Mult coeff by input sample */
         v += t;            /* The filter output */
         Ho += dhb;     /* IR step */
         Xp += Inc;     /* Input signal step. NO CHECK ON BOUNDS */
      }
   else 
      while ((Hp = &Imp[(int)Ho]) < End) {
         dprintf("while begin: Hp = %p, *Hp = %a, (int)Ho = %d, Imp[(int)Ho] = %a, &Imp[(int)Ho] = %p", Hp, *Hp, (int)Ho, Imp[(int)Ho], &Imp[(int)Ho]);
         t = *Hp;       /* Get IR sample */
         dprintf("before t = %a, *Xp = %a, Xp = %p", t, *Xp, Xp);
         t *= *Xp;      /* Mult coeff by input sample */
         dprintf("after2 t = %a, v = %a", t, v);
         v += t;            /* The filter output */
         dprintf("v = %a", v);
         Ho += dhb;     /* IR step */
         Xp += Inc;     /* Input signal step. NO CHECK ON BOUNDS */
      }

   return v;
}

I logged values of t, *Xp, Xp before and after multiplication:

while begin: Hp = 0xaf5daa8, *Hp = -0.009034, (int)Ho = 16384, Imp[(int)Ho] = -0.009034, &Imp[(int)Ho] = 0xaf5daa8
before multiplication t = -0.009034, *Xp = 0.000000, Xp = 0xaebe9b8
after multiplication t = nan

This code run many times, there are the same t and Xp values before crash:

before multiplication t = -0.009034, *Xp = 0.000000, Xp = 0xaebe9c8
after multiplication t = -0.000000, v = 282.423676

Or another case with addition:

before addition t = -460.799988, v = 0.000000
after addition v = nan

What could be causing nan? This is compiled with gcc 4.1.2 on Linux.

Update: Print variables as %a. Result:

//t = 0x1.2806bap+2
//Hp = 0xb3bb870
t = *Hp;
//t = nan

Update 2: There is no such issue if code is compiled by icpc. So is there compiler specific issue?

Upvotes: 1

Views: 4320

Answers (4)

Stephen Canon
Stephen Canon

Reputation: 106177

There is a fifth possibility that Eric Postpischil's otherwise excellent answer doesn't mention:

  • The multiplication is being performed in the x87 registers, and a floating-point stack overflow has occurred due to (possibly unrelated) earlier operations in your program's execution. When the processor is in this failure state, all computations performed on the x87 registers produce NaN results.

The two most common causes of this are calling functions that return floating-point results that do not have a prototype in scope (with many calling conventions, this will result in the caller failing to pop the result off the FP stack), and incorrect hand-written (possibly inline) assembly.

The fact that the failure only occurs after some time has elapsed provides some evidence for this possibility; if there is a single rarely-used codepath that leaks an element of the floating-point stack, it would need to be used some number of times before the failure manifests, which could have allowed it to escape notice until now.

To diagnose or rule out this possibility, you need to look at bit 6 (SF) of the floating-point status register (FPSR). Depending on the compiler you are using, the exact means to inspect the FPSR may vary.

Upvotes: 4

Eric Postpischil
Eric Postpischil

Reputation: 222724

Clearly, -0.009034•0.000000 should not produce NaN. Therefore, either the code and data presented in the problem is not an accurate representation of the actual computation, or the computing implementation is defective.

If we assume the hardware and basic computing implementation is not defective, then some possibilities to investigate include:

  • The logging of t and *Xp failed to log the correct values of t and *Xp immediately before the multiplication or the correct value of t immediately after the multiplication.
  • The display of the values of t or *Xp are incorrect. E.g., the formatting used to display *Xp showed “0.000000” even though *Xp had some other value, such as NaN.
  • Xp points somewhere inappropriate, resulting in *Xp being unreliable (e.g., changed by external operations).
  • The code displayed is inaccurate. E.g., it is from old source that has been changed, or it is new source but previously compiled code is being executed.

Note: When debugging with floating-point objects, you should not print with formats such as “%f”, especially not with default values for numbers of digits. You should print with “%a”, which prints the exact value of a floating-point value, using a hexadecimal representation. You may also use “%.99g” in many situations, provided your C implementation provides a good conversion of the floating-point value to decimal.

Upvotes: 6

Mats Petersson
Mats Petersson

Reputation: 129374

You'll have to print the sub-results for each of your calculations - or use the isnan() function to check at regular places, and track down where it's coming from. It's either some "bad" math, or you are feeding in garbage in the first place (uninitialized variables may well be NaN's)

Upvotes: 0

Sunil Bojanapally
Sunil Bojanapally

Reputation: 12658

From Wiki, there are three kinds of operations that can return NaN as follows:

1. Operations with a NaN as at least one operand.
2. Indeterminate forms
      The divisions 0/0 and ±∞/±∞
      The multiplications 0×±∞ and ±∞×0
      The additions ∞ + (−∞), (−∞) + ∞ and equivalent subtractions
      The standard has alternative functions for powers:
      The standard pow function and the integer exponent pown function define 0pow(0), 1pow(∞),
      and ∞pow(0) as 1.
      The powr function defines all three indeterminate forms as invalid operations and 
      so returns NaN.
3. Real operations with complex results, for example:
      The square root of a negative number.
      The logarithm of a negative number
      The inverse sine or cosine of a number that is less than −1 or greater than +1.

Now this should help you solving the issue yourself.

Upvotes: 1

Related Questions