user72716
user72716

Reputation: 273

Simulate fat tail data in R

I need to simulate data in R with a fat tail distribution, and having never simulated data before I'm not sure where to start. I have looked into the FatTailsR package but the documentation is pretty cryptic and I can't seem to find any obvious tutorials.

Basically, I want to create an artificial dataframe with two columns (X and Y), of 10,000 observations, that uses the following logic/iterations:

Any guidance would be appreciated. Including suggestions of packages and functions to check out (maybe something like rlnorm ?)

Upvotes: 2

Views: 1472

Answers (2)

Trusky
Trusky

Reputation: 503

This is what I understood from your question :

data <- data.frame(X=1:10000, Y=sample(c(0,1), 10000, TRUE, prob=c(0.75, 0.25)))
head(data)

pos <- which(data$Y == 1)
pos <- sample(pos, floor(0.25*length(pos)), FALSE)  # 25% of Y == 1

data[pos, "Y"] <- data[pos, "Y"] + 1

## Iterate using a while loop :

data <- data.frame(X=1:10000, Y=sample(c(0,1), 10000, TRUE, prob=c(0.75, 0.25)))
head(data)

i <- 0

while(i < 10) {
  pos <- which(data$Y == (i + 1))
  pos <- sample(pos, floor(0.25*length(pos)), FALSE)  # 25% of Y == 1

  data[pos, "Y"] <- data[pos, "Y"] + 1

  i <- i + 1
}

hist(data$Y)

Upvotes: 2

Ben Bolker
Ben Bolker

Reputation: 226182

This might work (not super-efficient, but ...)

First figure out the probabilities of each outcome (P(1)=0.75, P(2)=0.75*0.25, P(3)=0.75*0.25^2 ...)

cc <- cumprod(c(0.75,rep(0.25,9)))

Choose a multinomial deviate with these probabilities (N=1 for each sample)

rr <- t(rmultinom(1000,size=1,prob=cc))

Figure out which value in each row is equal to 1:

storage.mode(rr) <- "logical"
out <- apply(rr,1,which)

Check results:

tt <- table(factor(out,levels=1:10))
  1   2   3   4   5   6   7   8   9  10 
756 183  43  14   3   1   0   0   0   0 

There might be a cleverer way to set this up in terms of a modified geometric distribution ...

Upvotes: 1

Related Questions