Reputation:
I have the following data set
aa <- data.frame("set_up" = c(1,1,1,1,1,1,2,2,2,3,3,3), set = c(1,1,1,2,2,2,1,1,1,3,3,3), mass = c(45,12,34,7,1,433,56,12,54,6,7,8))
I want to find the parameter k of the negative binomial function grouped by set and set_up.
The fitdist(data = aa$mass, distr = "nbinom", method = "mle")$estimate[[1]]
gives the value of the k parameter. I want to estimate the k for each group of set_up and set.
Here is the dplyr code for it
library(fitdistrplus)
aak <- aa %>%
group_by(set_up, set)%>%
summarise(ktotalinf = fitdist(data = aa$mass, distr = "nbinom", method = "mle")$estimate[[1]])%>%
as.data.frame()
I get an output, but it is the same value repeated for each row. This value of the estimate[[1]] is the same as if all the mass data were pooled (and not grouped). Any suggestions on how to resolve this?
Upvotes: 0
Views: 3390
Reputation: 263342
You got the answer, but not the reasoning behind it. The magrittr/dplyr mechanism is to create a local environment for the application of each successive function along the chain of %>%
passages.
When you gave the fitdistrplus::fitdist
function the data argument of aa$mass
, you actually went outside of the local environment where the values had been separately grouped by your "set"
variable. The is no aa
-named entity inside the local environment. There is an entity named .
(a period), which gets passed along from function to function, getting altered in some manner at each step. Instead of apply
-ing the function to each group, fitdist
always got the same argument, which was the entire dataframe. When you change the data argument to mass
, the R interpreter first looks inside the local environment and does find a named entity within each group.
Upvotes: 1