jeza
jeza

Reputation: 299

Generate specific outliers with random data in R

I am trying to produce a cluster outliers as in the figure below to investigate that situation in more depths.

I tried but with nothing, because the figure has one dependent and independent variable. I want the same situation but with more than one independent variables. So, it will be one dependent variable and a matrix of independent variables.

enter image description here

My attempt R-Code was as below

n=50
p=2
x <- matrix(rnorm(n*p),ncol = p)
y <- rnorm(n)
b=quantile(x,probs = 0.95)
id=which(x>b)
no=length(id)
x[id]=rnorm(no,5,0.5)
y[id]=rnorm(1)+10

UPDATE

I try the following code but still is not the same as shown in the figure

xa=rnorm(50)
xb=runif(50,min = 0,max=400)
x=rbind(xa,xb)
y=rnorm(100)
plot(x,y)

Upvotes: 3

Views: 1143

Answers (2)

You can reproduce your plot with

set.seed(1)

xa = runif(20,0,20)
xb = runif(5,50,60)
x  = c(xa,xb)

y  = c(runif(20,25,120),runif(5,30,40))

plot(x,y,xlab="Independent variable",ylab="Response variable",xlim=c(0,60),ylim=c(25,120),pch=16)

Upvotes: 2

392781
392781

Reputation: 5

A quick and dirty work around for the multidimensional data would be to make a loop that generates rnorm values and saves them as column vectors in a dataframe.

Another option is to use the MASS package's rmvnorm function.

For the outliers, you could generate a bunch of random numbers using runif(n, min=a, max=b) using the same loop to dataframe process I mentioned.

Upvotes: 0

Related Questions