BenR
BenR

Reputation: 89

Problems with scale() and the Multidimensional Lp-Norm

Today I was trying to debug my code and stumbled across something that renders my solutions useless. What i am generally trying to calculate is the multidimensional L2-Norm for the following two matrices. As long as I am not using scale() everything is working fine. Nonetheless, as soon as I scale the matrices the solutions of the three used approaches are not the same anymore. What am I missing here?

set.seed(655)
df.a <- data.frame(A = sample(100:124, 24), B = sample(1:24, 24), C = sample(1:24, 24), D = rep(0, times=24))
df.b <- data.frame(A = sample(125:148, 24), B = sample(25:48, 24), C = sample(1:24, 24), D = sample(1:100, 24))

For this reason I have three different approaches:

  1. sapply-function and sqrt of rowSums

    sse <-  function(x1, x2) sum((x1 - x2) ^ 2)
    
    distanceChangeByTech <- function(x) {
      sse(df.a[,x], df.b[,x])
    }
    help1 <- t(data.frame(sapply(colnames(df.a), distanceChangeByTech)))
    dist_sap <- sqrt(rowSums(help1))
    
  2. multidimensional Euclidean distance using RCPP:

    multiEucl <- cxxfunction(signature(x="matrix", y="matrix"), plugin="Rcpp",
                      body='
                      Rcpp::NumericMatrix dx(x);
                      Rcpp::NumericMatrix dy(y);
    
                      const int N = dx.nrow();
                      const int M = dx.ncol();
    
                      double sum = 0;
    
                      for(int i=0; i<N; i++){
                      for(int j=0; j<M; j++){
                      sum = sum + pow(dx(i,j) - dy(i,j), 2);
                      }
                      }
    
                      return wrap(sqrt(sum));
                      ')
    
  3. multidimensional Lp-Norm using RCPP:

    multiPNorm <- cxxfunction(signature(x="matrix", y="matrix", p="numeric"), plugin="Rcpp",
                      body='
                      Rcpp::NumericMatrix dx(x);
                      Rcpp::NumericMatrix dy(y);
                      double dp = Rcpp::as<double>(p);
    
                      const int N = dx.nrow();
                      const int M = dx.ncol();
    
                      double sum = 0;
                      double rsum = 0;
    
                      for(int i=0; i<N; i++){
                      for(int j=0; j<M; j++){
                      sum = sum + pow(abs(dx(i,j) - dy(i,j)), dp);
                      }
                      }
    
                      rsum = pow(sum, 1/dp);
                      return wrap(rsum);
                      ')
    

When I tried this at first all worked well.

> multiEucl(as.matrix(df.a), as.matrix(df.b)) 
[1] 366.1543
> multiPNorm(as.matrix(df.a), as.matrix(df.b), 2) 
[1] 366.1543
> sqrt(rowSums(help1)) sapply.colnames.df.a...distanceChangeByTech. 
366.1543

But as soon as I scale the matrices, which I want to do because I will do a Clustering based on these distancemeasures, there is a fault. The solutions are not the same anymore?! What is causing this? I am using these commands to scale.

df.a <- as.data.frame(scale(df.a)) 
df.a[is.na(df.a)] <- 0
df.b <- as.data.frame(scale(df.b))
df.b[is.na(df.b)] <- 0

> multiEucl(as.matrix(df.a), as.matrix(df.b))
[1] 12.51781
> multiPNorm(as.matrix(df.a), as.matrix(df.b), 2)
[1] 8.944272
> sqrt(rowSums(help1))
sapply.colnames.df.a...distanceChangeByTech. 
                                    12.51781 

Upvotes: 0

Views: 127

Answers (2)

Dirk is no longer here
Dirk is no longer here

Reputation: 368509

You used abs() which is documented eg here but you meant to use fabs() which is documented here.

The cmath.h header provides overloaded abs() as well, but you probably didn't include that.

Upvotes: 2

BenR
BenR

Reputation: 89

It seems that abs() is not doing the right thing here. Instead I changed my coding of the multiPNorm and the changes seem to work.

multiPNorm <- cxxfunction(signature(x="matrix", y="matrix", p="numeric"), plugin="Rcpp",
                      body='
                      Rcpp::NumericMatrix dx(x);
                      Rcpp::NumericMatrix dy(y);
                      double dp = Rcpp::as<double>(p);

                      const int N = dx.nrow();
                      const int M = dx.ncol();

                      double sum = 0;
                      double rsum = 0;
                      double help = 0;

                      for(int i=0; i<N; i++){
                      for(int j=0; j<M; j++){
                        help = dx(i,j) - dy(i,j);
                        if (help < 0) {
                          help = - help;
                        }
                        sum = sum + pow(help, dp);
                      }
                      }

                      rsum = pow(sum, 1/dp);
                      return wrap(rsum);
                      ')

Upvotes: 0

Related Questions