Reputation: 299
I am trying to produce a cluster outliers as in the figure below to investigate that situation in more depths.
I tried but with nothing, because the figure has one dependent and independent variable. I want the same situation but with more than one independent variables. So, it will be one dependent variable and a matrix of independent variables.
My attempt R-Code was as below
n=50
p=2
x <- matrix(rnorm(n*p),ncol = p)
y <- rnorm(n)
b=quantile(x,probs = 0.95)
id=which(x>b)
no=length(id)
x[id]=rnorm(no,5,0.5)
y[id]=rnorm(1)+10
UPDATE
I try the following code but still is not the same as shown in the figure
xa=rnorm(50)
xb=runif(50,min = 0,max=400)
x=rbind(xa,xb)
y=rnorm(100)
plot(x,y)
Upvotes: 3
Views: 1143
Reputation: 518
You can reproduce your plot with
set.seed(1)
xa = runif(20,0,20)
xb = runif(5,50,60)
x = c(xa,xb)
y = c(runif(20,25,120),runif(5,30,40))
plot(x,y,xlab="Independent variable",ylab="Response variable",xlim=c(0,60),ylim=c(25,120),pch=16)
Upvotes: 2
Reputation: 5
A quick and dirty work around for the multidimensional data would be to make a loop that generates rnorm
values and saves them as column vectors in a dataframe.
Another option is to use the MASS
package's rmvnorm
function.
For the outliers, you could generate a bunch of random numbers using runif(n, min=a, max=b)
using the same loop to dataframe process I mentioned.
Upvotes: 0