Reputation: 33
I am attempting to clean a survey data set and am having trouble with conditionals. Thanks to all who answered my last question, but this one is slightly different and stumping me too.
I have a dataset like the one below. I am trying to write a statement such that:
Here is an example dataset that I built that shows what I mean:
set.seed(2)
df <- data.frame(
X = as.factor(sample(c("1.00", "#NULL!"), 10, replace = TRUE)),
Y = as.factor(sample(c("2.00", "#NULL!"), 10, replace = TRUE)),
Z = as.factor(sample(c("3.00", "#NULL!"), 10, replace = TRUE)),
a = as.factor(sample(c("4.00", "#NULL!"), 10, replace = TRUE))
)
df
Output:
> df
X Y Z a
1 1.00 2.00 #NULL! 4.00
2 1.00 2.00 3.00 #NULL!
3 #NULL! #NULL! #NULL! 4.00
4 #NULL! 2.00 3.00 4.00
5 1.00 #NULL! 3.00 #NULL!
6 #NULL! 2.00 3.00 #NULL!
7 #NULL! #NULL! 3.00 #NULL!
8 #NULL! #NULL! 3.00 4.00
9 #NULL! 2.00 #NULL! #NULL!
10 1.00 #NULL! 3.00 4.00
In this case, all null values for X, Y, and Z should be made 0 except for row 3, where they should be made NA. Column a should remain untouched. Does anybody have an idea how to approach this? Several convoluted ifelse()
statements haven't worked, and I've been trying to modify a dplyr
script someone suggested for another problem but I can't get that to work either.
Thank you!
Upvotes: 2
Views: 179
Reputation: 68
This is a roundabout way to do it, but converting your factors to numeric first makes it easier to get the result.
new.df<-df %>%
mutate_if(is.factor, as.character) %>% #convert columns to characters first
mutate_if(is.character, as.numeric) %>% #convert the characters to numeric
mutate_if(is.numeric, replace_na, replace = 0) %>% #replace all NAs with 0
mutate(TEST = ifelse(X==0&Y==0&Z==0, NA, 0)) %>% #create column to test for conditions
mutate(X = ifelse(is.na(TEST), "#NULL!", X),
Y = ifelse(is.na(TEST), "#NULL!", Y),
Z = ifelse(is.na(TEST), "#NULL!", Z)) %>% #replace NAs with initial NULL results
select(-TEST) #remove test column
Upvotes: 1
Reputation: 887213
As the columns are factor
, create a level
'0' and change the level "#NULL!" to "0" and replace the rows that have all 0's to NA
df[] <- lapply(df, function(x) {levels(x) <- c(levels(x), "0")
levels(x)[levels(x) == "#NULL!"] <- "0"
x})
df[rowSums(df == '0') == ncol(df), ] <- NA
Assuming that the OP wanted to return as numeric
column
df[] <- lapply(df, function(x)
as.numeric(replace(as.character(x), x== "#NULL!", "0")))
df[rowSums(df == 0) == ncol(df), ] <- NA
Upvotes: 1