Tim
Tim

Reputation: 7464

How does R base C code handle vectorization?

If we look at any function written in C in R base source code we can see that the code is quite simple, e.g.

#include "nmath.h"
#include "dpq.h"

double dexp(double x, double scale, int give_log)
{
#ifdef IEEE_754
    /* NaNs propagated correctly */
    if (ISNAN(x) || ISNAN(scale)) return x + scale;
#endif
    if (scale <= 0.0) ML_ERR_return_NAN;

    if (x < 0.)
    return R_D__0;
    return (give_log ?
        (-x / scale) - log(scale) :
        exp(-x / scale) / scale);
}

While the function is vectorized if called from R:

dexp(rep(1, 5), 1:2)
## [1] 0.3678794 0.2706706 0.3678794 0.2706706 0.3678794

How is it so? What makes it properly vectorized?

Upvotes: 3

Views: 125

Answers (1)

Martin Morgan
Martin Morgan

Reputation: 46876

Actually, you're not quoting the code that defines the R function. In R we see

> dexp
function (x, rate = 1, log = FALSE) 
.Call(C_dexp, x, 1/rate, log)
<bytecode: 0x3582cf8>
<environment: namespace:stats>

which tells us that the dexp() function is defined in the stats package. With a little digging we see in the stats NAMESPACE

useDynLib(stats, .registration = TRUE, .fixes = "C_")

which tells us that the prefix C_ is added to the function name when exposed to R. So we're looking for a C function called dexp in the stats package. There are two relevant entries

init.c:147:    CALLDEF_MATH2_1(dexp),
distn.c:144:DEFMATH2_1(dexp)

the first is a macro making the C function available to R, the second is a macro defining the function. For the later, the macro is defined as

#define DEFMATH2_1(name) \
    SEXP do_##name(SEXP sa, SEXP sb, SEXP sI) { \
        return math2_1(sa, sb, sI, name); \
    }

which tells us to look for the math2_1 function, a little further up the file

static SEXP math2_1(SEXP sa, SEXP sb, SEXP sI, double (*f)(double, double, int))
{
    SEXP sy;
    R_xlen_t i, ia, ib, n, na, nb;
    double ai, bi, *a, *b, *y;
    int m_opt;
    int naflag;

    if (!isNumeric(sa) || !isNumeric(sb))
    error(R_MSG_NONNUM_MATH);

    SETUP_Math2;
    m_opt = asInteger(sI);

    mod_iterate(na, nb, ia, ib) {
//  if ((i+1) % NINTERRUPT) R_CheckUserInterrupt();
    ai = a[ia];
    bi = b[ib];
    if_NA_Math2_set(y[i], ai, bi)
    else {
        y[i] = f(ai, bi, m_opt);
        if (ISNAN(y[i])) naflag = 1;
    }
    }
    FINISH_Math2;
    return sy;
} /* math2_1() */

That mod_iterate() call is actually another macro

#define mod_iterate(n1,n2,i1,i2) for (i=i1=i2=0; i<n; \
    i1 = (++i1 == n1) ? 0 : i1,\
    i2 = (++i2 == n2) ? 0 : i2,\
    ++i)

and you can see, at least with the eye of faith, that the C code is implementing a loop, so the magic implied by the original question (that compact C code results in vectorized calculations) is not there -- it is an iteration at the C level.

math2_1 takes as it's final argument a function, which in this case is the dexp quoted in the original question. This function is applied to each element in the iteration.

Upvotes: 6

Related Questions