Reputation: 13
I have a dataset where the observations have a wide range (10,000 to around 21,000,000). I am trying to overlay a Poisson distribution over this data, but the distribution is being outputted incorrectly. I have tried using this code so far:
dat <- read.csv('data.csv', TRUE, ',')
hist(dat,
main = 'Global Sales of Games in 2010',
xlab = 'Amount of Copies Sold',
ylab = 'Counts',
col = 'palegreen1',
breaks = 100
)
lam = mean(dat)
t = seq(min(dat), max(dat), length.out = 10000)
lines(t, dpois(t, lambda = lam), col='red', lwd=3)
I have also tried this by generating data from a poisson distribution using rpois, but still run into the same problem.
simulated = rpois(length(dat), lam)
simulated_lam = mean(simulated)
a = seq(min(simulated), max(simulated), length.out = 10000)
hist(simulated)
lines(a, dpois(a, lambda = simulated_lam), col='red', lwd=3)
I have referenced this question here, but can not produce the same results. R: Overlay Poisson distribution over histogram of data
I have images of the resulting output, but can not post it due to this being a new account. If anyone knows an alternative way of posting images, I would glady be able to follow up.
Thanks in advance.
Upvotes: 1
Views: 1092
Reputation: 4658
Your code throws some warnings, since you are using dpois(t, lambda = lam)
with a t
that is not an integer (you can see those warnings by typing warnings()
in your console). By changing length.out = 10000
into by = 1
, you force t
to consist only of integers, assuming your dat
contains only integers.
Below, I made an example that works (in which dat
is randomly generated by me). Note that I multiplied the dpois()
call by the dataset size to go from densities to counts.
dataset_size <- 100
dat <- rpois(dataset_size, lambda = 10)
hist(dat,
main = 'Global Sales of Games in 2010',
xlab = 'Amount of Copies Sold',
ylab = 'Counts',
col = 'palegreen1',
breaks = 100
)
lam = mean(dat)
t = seq(min(dat), max(dat), by = 1)
lines(t, dpois(t, lambda = lam)*dataset_size, col='red', lwd=3)
Upvotes: 0