BIN
BIN

Reputation: 781

Merge and change NA separately in R

My goal is merging 2 data sets together but I need to keep track "NA", I need to "NA" when I merge df1 and df2, and "NA" of df1 assign something like 9999. The problem is my data have so many variables, it includes day, numeric, character..., so when I try to set "NA" of df1 by df1[is.na(df1)] <- 9999, it works only numeric, there is any way I can approach my solution to have different NA seperately.

df1 <- data.frame(ID= c(1:10), 
              Value=c(3,NA,7,2:8),
              Group = c("A",NA,"C","D",NA,"B",NA,"C","D",NA))

df2 <- data.frame(ID = c(5:14),Count =c(1:9,NA),
                  School = c("A",NA,"C","D",NA,"B","NA","C","D",NA))

df1[is.na(df1)] <- 9999

data <- merge(df1,df2,all = TRUE,by= "ID")

   ID Value Group Count School
1   1     3     A    NA   <NA>
2   2  9999  <NA>*   NA   <NA>
3   3     7     C    NA   <NA>
4   4     2     D    NA   <NA>
5   5     3  <NA>*    1      A
6   6     4     B     2   <NA>
7   7     5   <NA>*    3      C
8   8     6     C     4      D
9   9     7     D     5   <NA>
10 10     8  <NA>*     6      B
11 11    NA  <NA>     7     NA
12 12    NA  <NA>     8      C
13 13    NA  <NA>     9      D
14 14    NA  <NA>    NA   <NA>

The ones * suppose 9999

Upvotes: 1

Views: 274

Answers (2)

jazzurro
jazzurro

Reputation: 23574

I'd like to contribute a bit more to this question. If you have, say, 100 columns in various classes and try to replace all NAs, you could try the following. The idea is that you convert all columns to character and replace all NAs with 9999. Then, you want to convert the classes of the columns back to the original classes. Finally, you merge df1 and df2.

library(dplyr)

# Save original classes.
original <- unlist(lapply(df1, class))

# Convert all columns to character and replace NAs with 9999
mutate_all(df1, as.character) %>%
mutate_each(funs(recode(., .missing = "9999"))) -> df1

# http://stackoverflow.com/questions/7680959/convert-type-of-multiple-columns-of-a-dataframe-at-once
# Credit to joran for this function.

convert.magic <- function(obj,types){

for (i in 1:length(obj)){
    FUN <- switch(types[i],character = as.character, 
                           numeric = as.numeric, 
                           factor = as.factor,
                           integer = as.integer,
                           logical = as.logical)
        obj[,i] <- FUN(obj[,i])
    }
    obj
}

out <- convert.magic(df1, original) %>%
       full_join(df2, by = "ID")

out

#   ID Value Group Count School
#1   1     3     A    NA   <NA>
#2   2  9999  9999    NA   <NA>
#3   3     7     C    NA   <NA>
#4   4     2     D    NA   <NA>
#5   5     3  9999     1      A
#6   6     4     B     2   <NA>
#7   7     5  9999     3      C
#8   8     6     C     4      D
#9   9     7     D     5   <NA>
#10 10     8  9999     6      B
#11 11    NA  <NA>     7     NA
#12 12    NA  <NA>     8      C
#13 13    NA  <NA>     9      D
#14 14    NA  <NA>    NA   <NA>

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520918

You can try replacing the NA values in df1$Group before doing the merge:

df1$Group <- as.character(df1$Group)
df1$Group[is.na(df1$Group)] <- 9999

But I have the feeling that you know this already, but were being thrown off because df1$Group is a factor, meaning the above code won't work as expected unless you first cast using as.character. You could also do this substitution after the merge.

Upvotes: 1

Related Questions