Reputation: 33

Conditionally changing variables that match multiple columns in R

I am attempting to clean a survey data set and am having trouble with conditionals. Thanks to all who answered my last question, but this one is slightly different and stumping me too.

I have a dataset like the one below. I am trying to write a statement such that:

If X, Y, and Z are all #NULL!, it changes those #NULL! entries to NA. Variable a is there to represent the 90+ other variables that are in the set that I don't want to mess with.
If any X, Y, or Z have numbers in them, #NULL changes to 0.

Here is an example dataset that I built that shows what I mean:

set.seed(2)
df <- data.frame(
  X = as.factor(sample(c("1.00", "#NULL!"), 10, replace = TRUE)),
  Y = as.factor(sample(c("2.00", "#NULL!"), 10, replace = TRUE)),
  Z = as.factor(sample(c("3.00", "#NULL!"), 10, replace = TRUE)),
  a = as.factor(sample(c("4.00", "#NULL!"), 10, replace = TRUE))
)
df

Output:

> df X Y Z a 1 1.00 2.00 #NULL! 4.00 2 1.00 2.00 3.00 #NULL! 3 #NULL! #NULL! #NULL! 4.00 4 #NULL! 2.00 3.00 4.00 5 1.00 #NULL! 3.00 #NULL! 6 #NULL! 2.00 3.00 #NULL! 7 #NULL! #NULL! 3.00 #NULL! 8 #NULL! #NULL! 3.00 4.00 9 #NULL! 2.00 #NULL! #NULL! 10 1.00 #NULL! 3.00 4.00

In this case, all null values for X, Y, and Z should be made 0 except for row 3, where they should be made NA. Column a should remain untouched. Does anybody have an idea how to approach this? Several convoluted ifelse() statements haven't worked, and I've been trying to modify a dplyr script someone suggested for another problem but I can't get that to work either.

Thank you!

Upvotes: 2

Answers (2)

S. Ash

Reputation: 68

This is a roundabout way to do it, but converting your factors to numeric first makes it easier to get the result.

new.df<-df %>%
  mutate_if(is.factor, as.character) %>%               #convert columns to characters first
  mutate_if(is.character, as.numeric) %>%              #convert the characters to numeric
  mutate_if(is.numeric, replace_na, replace = 0) %>%   #replace all NAs with 0
  mutate(TEST = ifelse(X==0&Y==0&Z==0, NA, 0)) %>%     #create column to test for conditions
  mutate(X = ifelse(is.na(TEST), "#NULL!", X),
         Y = ifelse(is.na(TEST), "#NULL!", Y),
         Z = ifelse(is.na(TEST), "#NULL!", Z)) %>%     #replace NAs with initial NULL results
  select(-TEST)                                        #remove test column

Upvotes: 1

akrun

Reputation: 887213

As the columns are factor, create a level '0' and change the level "#NULL!" to "0" and replace the rows that have all 0's to NA

df[] <- lapply(df, function(x) {levels(x) <- c(levels(x), "0")
           levels(x)[levels(x) == "#NULL!"] <- "0"
        x})

df[rowSums(df == '0') == ncol(df), ] <- NA

Assuming that the OP wanted to return as numeric column

df[] <- lapply(df, function(x) 
      as.numeric(replace(as.character(x), x== "#NULL!", "0")))    
df[rowSums(df == 0) == ncol(df), ] <- NA

Upvotes: 1

Conditionally changing variables that match multiple columns in R

Answers (2)

Related Questions