Reputation: 7313
I want to do something VERY similar to this question: how to use merge() to update a table in R
but instead of just one column being the index, I want to match the new values on an arbitrary number of columns >=1.
foo <- data.frame(index1=c('a', 'b', 'b', 'd','e'),index2=c(1, 1, 2, 3, 2), value=c(100,NA, 101, NA, NA))
Which has the following values
foo
index1 index2 value
1 a 1 100
2 b 1 NA
3 b 2 101
4 d 3 NA
5 e 2 NA
And the data frame bar
bar <- data.frame(index1=c('b', 'd'),index2=c(1,3), value=c(200, 201))
Which has the following values:
bar
index1 index2 value
1 b 1 200
2 d 3 201
merge(foo, bar, by='index', all=T) It results in this output:
Desired output:
foo
index1 index2 value
1 a 1 100
2 b 1 200
3 b 2 101
4 d 3 201
5 e 2 NA
Upvotes: 2
Views: 1355
Reputation: 2085
You can specify as many columns as you want with merge:
out <- merge(foo, bar, by=c("index1", "index2"), all.x=TRUE)
new <- apply(out[,3:4], 1, function(x) sum(x, na.rm=TRUE))
new <- ifelse(is.na(out[,3]) & is.na(out[,4]), NA, new)
out <- cbind(out[,1:2], new)
Upvotes: 0
Reputation: 121608
I think you don't need a merge
but more to rbind
and filter them later. Here I am using data.table
for sugar syntax.
dx <- rbind(bar,foo)
library(data.table)
setDT(dx)
## note this can be applied to any number of index
setkeyv(dx,grep("index",names(dx),v=T))
## using unqiue to remove all duplicated
## here it will remove the duplicated with missing values which is the
## expected behavior
unique(dx)
# index1 index2 value
# 1: b 1 200
# 2: b 2 101
# 3: d 3 201
# 4: a 1 100
# 5: e 2 NA
you can be more explicit and filter your rows by group of indexs:
dx[,ifelse(length(value)>1,value[!is.na(value)],value),key(dx)]
Upvotes: 4
Reputation: 61214
Here's an R base approach
> temp <- merge(foo, bar, by=c("index1","index2"), all=TRUE)
> temp$value <- with(temp, ifelse(is.na(value.x) & is.na(value.y), NA, rowSums(temp[,3:4], na.rm=TRUE)))
> temp <- temp[, -c(3,4)]
> temp
index1 index2 value
1 a 1 100
2 b 1 200
3 b 2 101
4 d 3 201
5 e 2 NA
Upvotes: 1
Reputation: 11957
You can use some dplyr voodoo to produce what you want. The following subsets the data by unique combinations of "index1" and "index2", and checks the contents of "value" for each subset. If "value" has any non-NA values, those are returned. If only an NA value is found, that is returned.
Seems a little specific, but it seems to do what you want!
library(dplyr)
df.merged <- merge(foo, bar, all = T) %>%
group_by(index1, index2) %>%
do(
if (any(!is.na(.$value))) {
return(subset(., !is.na(value)))
} else {
return(.)
}
)
Output:
index1 index2 value
<fctr> <fctr> <dbl>
1 a 1 100
2 b 1 200
3 b 2 101
4 d 3 201
5 e 2 NA
Upvotes: 0