BIN
BIN

Reputation: 781

Fill in the same value for each row by group in R

The last time I asked the question about fill in the same value for each row by group in R, now I deal with exact the same problem but there are some missing value NA. Here is data,blank"" means that person doesn't exposed in that window, NA treats as missing, 1st means the person exposed in the first window..

ID <- c(1,1,2,2,2,3,3,4,4,4)
x <- c("1st","","1st","1st","","",NA,"1st",NA,"1st")
y <- c("2nd","2nd","","","","2nd","2nd","","",NA)
z <- c("","","3rd","3rd","",NA,"3rd","",NA,"")
m <- c(10:19)
n <- c(20:29)
df <- data.frame(ID,x,y,z,m,n)
library(data.table)
setDT(df)[, c("x1", "y1", "z1") := lapply(.SD, function(x) x[which.max(x !=   "")]), by = ID]

I got the output, it's pretty much the one I want except NA

    ID   x   y   z  m  n  x1  y1  z1
 1:  1 1st 2nd     10 20 1st 2nd    
 2:  1     2nd     11 21 1st 2nd    
 3:  2 1st     3rd 12 22 1st     3rd
 4:  2 1st     3rd 13 23 1st     3rd
 5:  2             14 24 1st     3rd
 6:  3     2nd  NA 15 25     2nd 3rd
 7:  3  NA 2nd 3rd 16 26     2nd 3rd
 8:  4 1st         17 27 1st        
 9:  4  NA      NA 18 28 1st        
10:  4 1st  NA     19 29 1st 

You can see row 6 and 7, ID is 3, it's supposed to fill x1 = NA, row 8,9,10, ID is 4, y1 and z1 will be NA, here is output I want

    ID   x   y   z  m  n  x1  y1  z1
 1:  1 1st 2nd     10 20 1st 2nd    
 2:  1     2nd     11 21 1st 2nd    
 3:  2 1st     3rd 12 22 1st     3rd
 4:  2 1st     3rd 13 23 1st     3rd
 5:  2             14 24 1st     3rd
 6:  3     2nd  NA 15 25 NA   2nd 3rd
 7:  3  NA 2nd 3rd 16 26 NA   2nd 3rd
 8:  4 1st         17 27 1st  NA  NA     
 9:  4  NA      NA 18 28 1st  NA  NA     
10:  4 1st  NA     19 29 1st  NA  NA

Upvotes: 1

Views: 1397

Answers (2)

akuiper
akuiper

Reputation: 214957

How about recoding condition for NA to be 0.5 which will prioritize NA over empty string but less than other strings:

df[, c("x1", "y1", "z1") := lapply(.SD, function(x) x[which.max(ifelse(is.na(x), 0.5, x != ""))]), by = ID]

df
#    ID   x   y   z  m  n  x1  y1  z1
# 1:  1 1st 2nd     10 20 1st 2nd    
# 2:  1     2nd     11 21 1st 2nd    
# 3:  2 1st     3rd 12 22 1st     3rd
# 4:  2 1st     3rd 13 23 1st     3rd
# 5:  2             14 24 1st     3rd
# 6:  3     2nd  NA 15 25  NA 2nd 3rd
# 7:  3  NA 2nd 3rd 16 26  NA 2nd 3rd
# 8:  4 1st         17 27 1st  NA  NA
# 9:  4  NA      NA 18 28 1st  NA  NA
#10:  4 1st  NA     19 29 1st  NA  NA

Upvotes: 1

Arun
Arun

Reputation: 118799

Just change:

x[which.max(x != "")]

to:

x[!x %in% c("", NA)][1L]

Upvotes: 4

Related Questions