carlos valiente
carlos valiente

Reputation: 61

How to update values from another table with conditions

I want to update values from table df1 with values from df2, only updating null values or zeros. I can do it with data.table or dplyr, but I can´t automate to all columns.

#data.table
df1 <- data.frame(x1=1:4, x2=c('a','b', NA, 'd'), x3=c(0,0,2,2), stringsAsFactors=FALSE)
df2 <- data.frame(x1=2:3, x2=c("zz", "qq"),x3=6:7, stringsAsFactors=FALSE)

require(data.table)
setDT(df1); setDT(df2)

df1[df2, on = .(x1), x2 := ifelse(is.na(x2) | x2 == 0 ,i.x2,x2)]

#dplyr
require(dplyr)
require(dplyr)
inner_join(df1,df2,by = c("x1" = "x1")) %>% 
  transmute(x1 = x1,
            x2 =ifelse(is.na(x2.x) | x2.x == 0,x2.y,x2.x),
            x3 =ifelse(is.na(x3.x) | x3.x == 0,x3.y,x3.x))

With dplyr at least I can manually adding columns getting the expected output, the problem is real dataframe has so much columns. Therefore I want to iterate across columns to achieved the task.

What I´ve tried:

# dplyr + apply
inner_join(df1,df2,by = c("x1" = "x1")) %>% 
  cbind(.$x1, 
        apply(.[-1],2, function(cname) ifelse(is.na(cname) | cname == 'b',paste(cname, ".x", collapse = ""),paste(cname, ".y", collapse = "")))
  )

# data.table with for
for (cname in names(df1)[!names(df1) %in% c("x1")]) {
  df1[i = df2, on = .(x1), j = cname := {function (x) ifelse(is.na(x) | x == 'b',i.x,x)} (cname)
        , with = FALSE]
} 

# data.table + lapply
df1[i = df2, on = .(x1)  ,names(df1)[!names(df1) %in% c("x1")] := lapply(df1[,names(df1)[!names(df1) %in% c("x1")],with=FALSE],
                           function(x) ifelse(is.na(x) | x == 0,df2.x,df1.x))]

Upvotes: 2

Views: 823

Answers (3)

Wimpel
Wimpel

Reputation: 27792

Here is a pure data.table approach...

The melting process takes care of all columns you wish to 'fill, putting them all in one single set of columns (variable and value). Then fill in all the 0/NA values using an update join (=fast!) Finally, recast everything back to it's original shape.

library(data.table)
#set to data.table
setDT(df1)
setDT(df2)
#melt to long
melt1 <- melt(df1, id.vars = "x1" )
melt2 <- melt(df2, id.vars = "x1" )
#join all values with value NA or 0
melt1[ is.na(value) | value == 0, 
       value := melt1[ is.na( value) | value == 0,][ melt2, value := i.value, on = .(x1, variable) ]$value][]
#cast to original wide format
dcast( melt1, x1 ~ variable )

output

#    x1 x2 x3
# 1:  1  a  0
# 2:  2  b  6
# 3:  3 qq  2
# 4:  4  d  2

Upvotes: 0

chinsoon12
chinsoon12

Reputation: 25223

For data.table, you can use:

for (x in setdiff(names(df1), "x1")) {
    df1[is.na(get(x)) ! get(x)==0, (x) := df2[.SD, on=.(x1), get(x)]]
}

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389325

Using base R, you can create a function to replace NA and 0 with corresponding values from another column

replace_na_0 <- function(x) {
   ifelse(is.na(x[[1]]) | x[[1]] == 0,x[[2]],x[[1]])
}

Do merge and pass group of columns by removing their postfix (.x, .y) to replace_na_0 function

temp_df <- merge(df1, df2, by = "x1")

cbind(temp_df[1], sapply(split.default(temp_df[-1], 
       sub("\\..*", "", names(temp_df)[-1])), replace_na_0))

#  x1 x2 x3
#1  2  b  6
#2  3 qq  2

Upvotes: 2

Related Questions