Reputation: 7313

How to use merge or replace to update a table in R with multiple columns

I want to do something VERY similar to this question: how to use merge() to update a table in R

but instead of just one column being the index, I want to match the new values on an arbitrary number of columns >=1.

foo <- data.frame(index1=c('a', 'b', 'b', 'd','e'),index2=c(1, 1, 2, 3, 2), value=c(100,NA, 101, NA, NA))

Which has the following values

foo
  index1 index2 value
1      a      1   100
2      b      1    NA
3      b      2   101
4      d      3    NA
5      e      2    NA

And the data frame bar

bar <- data.frame(index1=c('b', 'd'),index2=c(1,3), value=c(200, 201))

Which has the following values:

 bar
  index1 index2 value
1      b      1   200
2      d      3   201

merge(foo, bar, by='index', all=T) It results in this output:

Desired output:

foo
  index1 index2 value
1      a      1   100
2      b      1   200
3      b      2   101
4      d      3   201
5      e      2    NA

Upvotes: 2

Answers (4)

AidanGawronski

Reputation: 2085

You can specify as many columns as you want with merge:

out <- merge(foo, bar, by=c("index1", "index2"), all.x=TRUE)
new <- apply(out[,3:4], 1, function(x) sum(x, na.rm=TRUE))
new <- ifelse(is.na(out[,3]) & is.na(out[,4]), NA, new)
out <- cbind(out[,1:2], new)

Upvotes: 0

agstudy

Reputation: 121608

I think you don't need a merge but more to rbind and filter them later. Here I am using data.table for sugar syntax.

dx <- rbind(bar,foo)
library(data.table)
setDT(dx)
## note this can be applied to any number of index
setkeyv(dx,grep("index",names(dx),v=T))
## using unqiue to remove all duplicated 
## here it will remove the duplicated with missing values which is the 
## expected behavior
unique(dx)

#    index1 index2 value
# 1:      b      1   200
# 2:      b      2   101
# 3:      d      3   201
# 4:      a      1   100
# 5:      e      2    NA

you can be more explicit and filter your rows by group of indexs:

 dx[,ifelse(length(value)>1,value[!is.na(value)],value),key(dx)]

Upvotes: 4

Jilber Urbina

Reputation: 61214

Here's an R base approach

> temp <- merge(foo, bar, by=c("index1","index2"), all=TRUE)
> temp$value <- with(temp, ifelse(is.na(value.x) & is.na(value.y), NA, rowSums(temp[,3:4], na.rm=TRUE)))
> temp <- temp[, -c(3,4)]
> temp
  index1 index2 value
1      a      1   100
2      b      1   200
3      b      2   101
4      d      3   201
5      e      2    NA

Upvotes: 1

jdobres

Reputation: 11957

You can use some dplyr voodoo to produce what you want. The following subsets the data by unique combinations of "index1" and "index2", and checks the contents of "value" for each subset. If "value" has any non-NA values, those are returned. If only an NA value is found, that is returned.

Seems a little specific, but it seems to do what you want!

library(dplyr)

df.merged <- merge(foo, bar, all = T) %>% 
  group_by(index1, index2) %>% 
  do(
    if (any(!is.na(.$value))) {
      return(subset(., !is.na(value)))
    } else {
      return(.)
    }
  )

Output:

  index1 index2 value
  <fctr> <fctr> <dbl>
1      a      1   100
2      b      1   200
3      b      2   101
4      d      3   201
5      e      2    NA

Upvotes: 0

How to use merge or replace to update a table in R with multiple columns

Answers (4)

Related Questions