Boram Lim
Boram Lim

Reputation: 323

How to fill missing values from other variable?

My data set is composed as below.

x    y 
0     
0    0
2    2
     2
     4
2
7    7

I want to merge x and y variable like this

x  
0     
0  
2  
2
4
2
7

That is, I want to fill x variable values with y variable values if x are missing. If there is no missing in both x and y variables two values are always same.

Upvotes: 2

Views: 1647

Answers (4)

David Arenburg
David Arenburg

Reputation: 92292

If like you say that both of x and y are the same when there are no missing values, you can easily vectorize with rowMeans or use pmax (or pmin) combined with do.call

Your data

df <- data.frame(x = c(0,0,2,NA,NA,2,7), y = c(NA,0,2,2,4,NA,7))

Solution #1

rowMeans(df, na.rm = TRUE)
## [1] 0 0 2 2 4 2 7

Solution #2

do.call(pmax, c(df, na.rm = TRUE)) # or do.call(pmin, c(df, na.rm = TRUE))
## [1] 0 0 2 2 4 2 7

For comparison, this will lose only to @MrFlicks approach on a relatively big data

n <- 1e5
dftest <- data.frame(x = as.vector(replicate(n, df$x)),
                     y = as.vector(replicate(n, df$y)))

library(microbenchmark)
microbenchmark(ifelse(is.na(dftest$x), dftest$y, dftest$x),
               dftest$x[is.na(dftest$x)] <- dftest$y[is.na(dftest$x)],
               apply(dftest, 1, max, na.rm = TRUE),
               rowMeans(dftest, na.rm = TRUE),
               do.call(pmax, c(dftest, na.rm = TRUE)))

# Unit: milliseconds
#                                                   expr        min         lq       mean     median         uq       max neval
#            ifelse(is.na(dftest$x), dftest$y, dftest$x)  121.16554  132.17962  188.81260  162.88925  242.37786  452.3506   100
# dftest$x[is.na(dftest$x)] <- dftest$y[is.na(dftest$x)]   32.46432   34.13887   45.88664   36.78413   42.72560  138.9821   100
#                    apply(dftest, 1, max, na.rm = TRUE) 2284.13414 2428.15899 2554.03813 2501.33842 2605.78132 3567.5111   100
#                         rowMeans(dftest, na.rm = TRUE)   40.04718   44.39996   61.89289   48.16691   54.88427  189.2017   100
#                 do.call(pmax, c(dftest, na.rm = TRUE))   44.68004   45.66772   52.64246   46.43867   50.02424  149.1624   100

Upvotes: 2

daroczig
daroczig

Reputation: 28632

As your data is already in a data.frame it seems:

df <- data.frame(x = c(0,0,2,NA,NA,2,7), y = c(NA,0,2,2,4,NA,7))

Then simply get the larger value from the columns:

> apply(df, 1, max, na.rm = TRUE)
[1] 0 0 2 2 4 2 7

Upvotes: 2

Ron Harlev
Ron Harlev

Reputation: 16673

Another variant

x = ifelse(is.na(x), y,x)

Upvotes: 3

MrFlick
MrFlick

Reputation: 206207

Just copy the y values for for the missing x values

x <- c(0,0,2,NA,NA,2,7)
y <- c(NA,0,2,2,4,NA,7)

x[is.na(x)] <- y[is.na(x)]

Upvotes: 3

Related Questions