Reputation: 25
I am trying to impute missing values in my dataframe with the non-parametric method available in missForest
.
My data (OneDrive link) consists of one categorical variable and five continuous variables.
head(data)
phylo sv1 sv2 sv3 sv4 sv5
1 Phaon_camerunensis 6.03803 NA 5121.257 NA 70
2 Umma_longistigma 6.03803 NA 5121.257 NA 53
3 Umma_longistigma 6.03803 NA 5121.257 NA 64
4 Umma_longistigma 6.03803 NA 5121.257 NA 63
5 Sapho_ciliata 6.03803 NA 5121.257 NA 63
6 Sapho_gloriosa 6.03803 NA 5121.257 NA 63
I was successful at first using missForest()
imp<- missForest(data[2:6])
However, instead of aggregating over the whole data matrix (or vector? idk exactly) I would like to impute missing values by phylo
.
I tried data[2:6] %>% group_by(phylo) %>%
and sapply(split(data[2:6], data$phylo)) %>%
but no success.
Any guess on how to deal with it?
Upvotes: 0
Views: 253
Reputation: 1
Although the question is not very clear, I assume that you want to impute subsets of your dataset according to the phylo
variable. So for that, you need to split your dataset by you factor variable and apply the imputation function on each subset. This could be implemented using only R base functions:
# convert phylo to factor
data$phylo <- as.factor(data$phylo)
# split and impute according to each level
data2 <-lapply(split(data,as.factor(data$phylon)), function(x) missForest::missForest(data))
# display the imputed dataset
data2$ximp
Upvotes: 0
Reputation: 24148
If you want to run missForest
for each group, you can use group_map
:
imp <- df %>% group_by(phylo) %>% group_map(~ missForest(.))
To get only the first item from the result:
imp2 <- t(sapply(imp, "[[", 1))
Upvotes: 1