Reputation: 273
I need to simulate data in R with a fat tail distribution, and having never simulated data before I'm not sure where to start. I have looked into the FatTailsR
package but the documentation is pretty cryptic and I can't seem to find any obvious tutorials.
Basically, I want to create an artificial dataframe with two columns (X and Y), of 10,000 observations, that uses the following logic/iterations:
Any guidance would be appreciated. Including suggestions of packages and functions to check out (maybe something like rlnorm
?)
Upvotes: 2
Views: 1472
Reputation: 503
This is what I understood from your question :
data <- data.frame(X=1:10000, Y=sample(c(0,1), 10000, TRUE, prob=c(0.75, 0.25)))
head(data)
pos <- which(data$Y == 1)
pos <- sample(pos, floor(0.25*length(pos)), FALSE) # 25% of Y == 1
data[pos, "Y"] <- data[pos, "Y"] + 1
## Iterate using a while loop :
data <- data.frame(X=1:10000, Y=sample(c(0,1), 10000, TRUE, prob=c(0.75, 0.25)))
head(data)
i <- 0
while(i < 10) {
pos <- which(data$Y == (i + 1))
pos <- sample(pos, floor(0.25*length(pos)), FALSE) # 25% of Y == 1
data[pos, "Y"] <- data[pos, "Y"] + 1
i <- i + 1
}
hist(data$Y)
Upvotes: 2
Reputation: 226182
This might work (not super-efficient, but ...)
First figure out the probabilities of each outcome (P(1)=0.75, P(2)=0.75*0.25, P(3)=0.75*0.25^2 ...)
cc <- cumprod(c(0.75,rep(0.25,9)))
Choose a multinomial deviate with these probabilities (N=1 for each sample)
rr <- t(rmultinom(1000,size=1,prob=cc))
Figure out which value in each row is equal to 1:
storage.mode(rr) <- "logical"
out <- apply(rr,1,which)
Check results:
tt <- table(factor(out,levels=1:10))
1 2 3 4 5 6 7 8 9 10
756 183 43 14 3 1 0 0 0 0
There might be a cleverer way to set this up in terms of a modified geometric distribution ...
Upvotes: 1