Programming Noob
Programming Noob

Reputation: 1332

Replacing NA values with a random value picked from another data frame

In my data I have two columns that contain NA values. So I made a new dataframe which has no NA values, I removed the rows which contaied NA values.

What I want is that each time there is an NA value in the original data (called metadata here), I want to sample randomly one sample from the new data frame (called temp).. (I removed the NAs so there is no risk of picking NA again).

However, my original data is not changing, it stays the same after performing this:

temp = metadata %>% drop_na()
for (i in length(metadata$Gender)){
  if (is.na(metadata$Gender[[i]])) {
    metadata$Gender[[i]] = sample(temp$Gender, 1)
  }
  
  if (is.na(metadata$Age[[i]])){
    metadata$Age[[i]] = sample(temp$Age, 1)
  }
}

Upvotes: 0

Views: 41

Answers (1)

akrun
akrun

Reputation: 887501

Instead of creating another object and replacing the NA based on that, we can loop across the columns of interest, replace the NA elements with the sample on non-NA elements and specify the size as the count of NA elements

library(dplyr)
metadata <- metadata %>%
    mutate(across(c(Gender, Age), ~ replace(.x, is.na(.x),
      sample(.x[!is.na(.x)], size = sum(is.na(.x)), replace = FALSE))))

Upvotes: 1

Related Questions