stats_noob
stats_noob

Reputation: 5925

(R) Error in Xi - Xj : non-numeric argument to binary operator

I am working with the R programming language. I am trying to recreate the graphs shown in this tutorial over here : https://www.rpubs.com/cboettig/greta-gp

This tutorial shows how to make a special type of regression model for 2 variables. I am able to copy and paste the code from this tutorial and successfully make the desired graphs:

#PART 1
#load libraries
library(MASS)
library(tidyverse)

#set seed
set.seed(12345)

#create initial data
x_predict <- seq(-5,5,len=50)
l <- 1

#define functions for evaluating the covariance
SE <- function(Xi,Xj, l) exp(-0.5 * (Xi - Xj) ^ 2 / l ^ 2)
cov <- function(X, Y) outer(X, Y, SE, l)
COV <- cov(x_predict, x_predict)

#sample these functions, place them into a data frame and plot
values <- mvrnorm(200, rep(0, length=length(x_predict)), COV)
dat <- data.frame(x=x_predict, t(values)) %>%
  tidyr::pivot_longer(-x, names_to = "rep", values_to = "value") %>% 
  mutate(rep = as.numeric(as.factor(rep)))

ggplot(dat,aes(x=x,y=value)) +
  geom_line(aes(group=rep), color =  rgb(0.7, 0.1, 0.4), alpha = 0.4) 

enter image description here

#PART2

#create new data

obs <- data.frame(x = c(-4, -3, -1,  0,  2),
                  y = c(-2,  0,  1,  2, -1))

#repeat steps from part 1

cov_xx_inv <- solve(cov(obs$x, obs$x))
Ef <- cov(x_predict, obs$x) %*% cov_xx_inv %*% obs$y
Cf <- cov(x_predict, x_predict) - cov(x_predict, obs$x)  %*% cov_xx_inv %*% cov(obs$x, x_predict)

values <- mvrnorm(200, Ef, Cf)

dat <- data.frame(x=x_predict, t(values)) %>%
  tidyr::pivot_longer(-x, names_to = "rep", values_to = "value") %>% 
  mutate(rep = as.numeric(as.factor(rep)))


gp <- data.frame(x = x_predict, Ef = Ef, sigma = 2*sqrt(diag(Cf)) )

ggplot(dat,aes(x=x,y=value)) + 
  geom_line(aes(group=rep), color =  rgb(0.7, 0.1, 0.4), alpha = 0.2) + #REPLICATES
  geom_ribbon(data = gp, 
              aes(x, 
                  y = Ef, 
                  ymin = Ef - sigma, 
                  ymax = Ef + sigma),
              fill="grey", alpha = 0.4) +
  geom_line(dat = gp, aes(x=x,y=Ef), size=1) + #MEAN
  geom_point(data=obs,aes(x=x,y=y)) +  #OBSERVED DATA
  scale_y_continuous(lim=c(-3,3), name="output, f(x)") +
  xlab("input, x")

enter image description here

Now, I am trying to replicate the above tutorial for a regression model with 3 variables (1 response, 2 predictors). I tried to make the "x_predict" object have two columns instead:

x_predict_1 <- seq(-5,5,len=50)
x_predict_2 <- seq(-6,6,len=50)

l <- 1

x_predict <- data.frame(x_predict_1, x_predict_2)


COV <- cov(x_predict, x_predict)

But this produces the following error:

Error in Xi - Xj : non-numeric argument to binary operator 

This error is preventing me from creating the "values" and the "dat" objects from part 1, and I can not create the desired graphs (e.g. x_predict_1 vs values and x_predict_2 vs values). This is also preventing me from creating the desired graphs in part 2.

Can someone please show me how to fix this problem? Thanks

Upvotes: 1

Views: 400

Answers (1)

Anup Tirpude
Anup Tirpude

Reputation: 654

I think I got the problem. First of all below is the way by which we can reproduce the error & the way you have proceed :

#PART 1
#load libraries
library(MASS)
library(tidyverse)

#set seed
set.seed(12345)

#create initial data
x_predict <- seq(-5,5,len=50)
l <- 1

#define functions for evaluating the covariance
SE <- function(Xi,Xj, l) exp(-0.5 * (Xi - Xj) ^ 2 / l ^ 2)
cov <- function(X, Y) outer(X, Y, SE, l)
COV <- cov(x_predict, x_predict)

#sample these functions, place them into a data frame and plot
values <- mvrnorm(200, rep(0, length=length(x_predict)), COV)
dat <- data.frame(x=x_predict, t(values)) %>%
  tidyr::pivot_longer(-x, names_to = "rep", values_to = "value") %>% 
  mutate(rep = as.numeric(as.factor(rep)))

ggplot(dat,aes(x=x,y=value)) +
  geom_line(aes(group=rep), color =  rgb(0.7, 0.1, 0.4), alpha = 0.4) 


x_predict_1 <- seq(-5,5,len=50)
x_predict_2 <- seq(-6,6,len=50)

l <- 1

x_predict <- data.frame(x_predict_1, x_predict_2)

COV <- cov(x_predict, x_predict)

At end of this code will end up with an error, same highlighted by Noob

enter image description here

Here point to note, that the cov function present in base R is redefined and set as below

cov <- function(X, Y) outer(X, Y, SE, l)

This custom function will only work with vectors/array not with data.frames which is the method used to join the x_predict_1 & x_predict_1 while extending the code.

enter image description here

If this custom function called on data.frame object it will always result an error as its not build to handle data.frame, it was built only for numeric vectors & arrays

enter image description here

Now when any new person will try to replicate it from in between, he by default will use 'cov' function from base R. which works on data.frame objects. hence its highly recommended to never re-define existing function in R, it leads to lots of confusion. If we remove the custom 'cov' function and call the cov(x_predict, x_predict) it will work without error which will be called from base R package.

enter image description here

So to resolve this problem, Noob you just need to use 'c' (combine) instead of 'data.frame' while joining the x_predict_1 & x_predict_2 and your problem will resolved. I am giving the full code I tried with your variables :

library(MASS)
library(tidyverse)

#set seed
set.seed(12345)

SE <- function(Xi,Xj, l) exp(-0.5 * (Xi - Xj) ^ 2 / l ^ 2)
cov <- function(X, Y) outer(X, Y, SE, l)

x_predict_1 <- seq(-5,5,len=50)
x_predict_2 <- seq(-6,6,len=50)

l <- 1

x_predict <- c(x_predict_1, x_predict_2)
head(x_predict,5)
COV <- cov(x_predict, x_predict)

values <- mvrnorm(200, rep(0, length=length(x_predict)), COV)
dat <- data.frame(x=x_predict, t(values)) %>%
  tidyr::pivot_longer(-x, names_to = "rep", values_to = "value") %>% 
  mutate(rep = as.numeric(as.factor(rep)))

ggplot(dat,aes(x=x,y=value)) +
  geom_line(aes(group=rep), color =  rgb(0.7, 0.1, 0.4), alpha = 0.4) 

The end result will be below graph. I hope this explanation will resolve your problem, if not please let me know.

enter image description here

here, If you does not want to use 'c' (combine) you can use cbind and create a Matrix. On it you can successfully use your custom function 'cov' it will work. but still when you go further with this approach you will end up another errors. Below is the first one occur due to COV being an array. hence I think or I guess using c (combine) is what you needed.

enter image description here

Upvotes: 2

Related Questions