Non_Praying_Mantis
Non_Praying_Mantis

Reputation: 25

How do I impute values by factor levels using 'missForest'?

I am trying to impute missing values in my dataframe with the non-parametric method available in missForest. My data (OneDrive link) consists of one categorical variable and five continuous variables.

head(data)
               phylo     sv1 sv2      sv3 sv4 sv5
1 Phaon_camerunensis 6.03803  NA 5121.257  NA  70
2   Umma_longistigma 6.03803  NA 5121.257  NA  53
3   Umma_longistigma 6.03803  NA 5121.257  NA  64
4   Umma_longistigma 6.03803  NA 5121.257  NA  63
5      Sapho_ciliata 6.03803  NA 5121.257  NA  63
6     Sapho_gloriosa 6.03803  NA 5121.257  NA  63

I was successful at first using missForest()

imp<- missForest(data[2:6])

However, instead of aggregating over the whole data matrix (or vector? idk exactly) I would like to impute missing values by phylo.

I tried data[2:6] %>% group_by(phylo) %>% and sapply(split(data[2:6], data$phylo)) %>% but no success.

Any guess on how to deal with it?

Upvotes: 0

Views: 253

Answers (2)

ielbadisy
ielbadisy

Reputation: 1

Although the question is not very clear, I assume that you want to impute subsets of your dataset according to the phylo variable. So for that, you need to split your dataset by you factor variable and apply the imputation function on each subset. This could be implemented using only R base functions:

# convert phylo to factor
data$phylo <- as.factor(data$phylo)

# split and impute according to each level 
data2 <-lapply(split(data,as.factor(data$phylon)), function(x) missForest::missForest(data))

# display the imputed dataset 
data2$ximp

Upvotes: 0

m0nhawk
m0nhawk

Reputation: 24148

If you want to run missForest for each group, you can use group_map:

imp <- df %>% group_by(phylo) %>% group_map(~ missForest(.))

To get only the first item from the result:

imp2 <- t(sapply(imp, "[[", 1))

Upvotes: 1

Related Questions