Renato Dinhani
Renato Dinhani

Reputation: 36666

How to merge two datasets by the different values in R?

I have two datasets and want to merge them. How I add to first dataset only the lines that are in the second that are not in the first?

Only add to final dataset if the value not exists in the another dataset. An example dataset:

x = data.frame(id = c("a","c","d","g"), 
               value = c(1,3,4,7))
y = data.frame(id = c("b","c","d","e","f"),
               value = c(5,6,8,9,7))   

The merged dataset should look like (the order is not important):

a 1
b 5
c 3
d 4
e 9    
f 7
g 7

Upvotes: 1

Views: 1601

Answers (3)

Roman Luštrik
Roman Luštrik

Reputation: 70623

You already answered your own question, but just didn't realize it right away. :)

> merge(x,y,all=TRUE)
  id value
1  a     1
2  c     3
3  c     6
4  d     4
5  d     8
6  g     7
7  b     5
8  e     9
9  f     7

EDIT

I'm a bit dense here and I'm not sure where you're getting at, so I provide you with a shotgun approach. What I did was I merged the data.frames by id and copied values from x to y if y` was missing. Take whichever column you need.

> x = data.frame(id = c("a","c","d","g"), 
+   value = c(1,3,4,7))
> y = data.frame(id = c("b","c","d","e","f"),
+   value = c(5,6,8,9,7))
> xy <- merge(x, y, by = "id", all = TRUE)
> xy
  id value.x value.y
1  a       1      NA
2  c       3       6
3  d       4       8
4  g       7      NA
5  b      NA       5
6  e      NA       9
7  f      NA       7
> find.na <- is.na(xy[, "value.y"])
> xy$new.col <- xy[, "value.y"]
> xy[find.na, "new.col"] <- xy[find.na, "value.x"]
> xy
  id value.x value.y new.col
1  a       1      NA       1
2  c       3       6       6
3  d       4       8       8
4  g       7      NA       7
5  b      NA       5       5
6  e      NA       9       9
7  f      NA       7       7

> xy[order(as.character(xy$id)), ]
  id value.x value.y new.col
1  a       1      NA       1
5  b      NA       5       5
2  c       3       6       6
3  d       4       8       8
6  e      NA       9       9
7  f      NA       7       7
4  g       7      NA       7

Upvotes: 1

Joshua Ulrich
Joshua Ulrich

Reputation: 176638

For your example to work, you first need to ensure that id in each data.frame are directly comparable. Since they're factors, you need ensure they have the same levels/labels; or you can just convert them to character.

# convert factors to character
x$id <- as.character(x$id)
y$id <- as.character(y$id)
# merge
z <- merge(x,y,by="id",all=TRUE)
# keep first value, if it exists
z$value <- ifelse(is.na(z$value.x),z$value.y,z$value.x)
# keep desired columns
z <- z[,c("id","value")]
z
#   id value
# 1  a     1
# 2  b     5
# 3  c     3
# 4  d     4
# 5  e     9
# 6  f     7
# 7  g     7

Upvotes: 2

Luciano Selzer
Luciano Selzer

Reputation: 10016

Using !, %in% and rbind:

rbind(x[!x$id %in% y$id,], y)
   id value
1   a     1
4   g     7
3   b     2
41  c     3
5   d     4
6   e     5
7   f     6

Upvotes: 2

Related Questions