Reputation: 171
I'm trying to merge two data frames with different dimensions, but just can't seem to get the result I'm after. Here's an example of what I'm trying to achieve:
Data frame A looks like this:
id value
A X
A Y
A Z
B Y
C X
C Z
Data frame B looks like this:
id value
A U
A W
B U
B W
B V
C V
And the data frame I'm trying to create looks like:
id value.A value.B
A X U
A Y W
A Z NA
B Y U
B NA W
B NA V
C X V
C Z NA
I've been trying to merge the data frames with the base merge
command, but keep getting repeated values instead of NA
's. It also seems to expand the resulting data frame to allow for unique combinations of value.A
and value.B
, which I'm also not interested in. Finally, not all ID's in data frame B have a matching ID in data frame A, in which case I would like value.B
to be NA
for that ID.
Is there a way to achieve what I'm looking for with merge
or am I looking for another (set of) command(s)?
Edit:
Just to clarify, I did try the various combinations of joins, but still haven't found a solution.
Example:
A <- data.frame(id = c("A", "A", "A", "B", "C", "C"),
value = c("X", "Y", "Z", "Y", "X", "Z"))
B <- data.frame(id = c("A", "A", "B", "B", "B", "C"),
value = c("U", "W", "U", "W", "V", "V"))
merge(A, B, by="id", all=TRUE)
Produces:
id value.x value.y
A X U
A X W
A Y U
A Y W
A Z U
A Z W
B Y U
B Y W
B Y V
C X V
C Z V
That is to say, a much larger data frame than what I'm looking for as the values are repeated for every unique pair of values in data frame A and B.
Edit 2: Looks like I simplified my initial example too much, so just to clarify why the order of my values matters, I'll have to expand the original example a bit:
Data frame A looks like this:
id value rank
A X 1
A Y 0.5
A Z 0.2
B Y 1
C X 1
C Z 0.8
Data frame B looks like this:
id value rank
A U 1
A W 0.6
B U 1
B W 0.2
B V 0.1
C V 1
And the intended data frame looks like this:
id value.A rank.A value.B rank.B
A X 1 U 1
A Y 0.5 W 0.6
A Z 0.2 NA NA
B Y 1 U 1
B NA NA W 0.2
B NA NA V 0.1
C X 1 V 1
C Z 0.8 NA NA
Upvotes: 0
Views: 983
Reputation: 38500
Here is my fancy cbind
. I create secondary IDs that assume the data is ordered in with corresponding rows and df2 may end before df for each id:
# add secondary IDs
df$idCnt <- rep_len(1L, length=nrow(df))
df$idCnt <- ave(df$idCnt, df$id, FUN=cumsum)
df2$idCnt <- rep_len(1L, length=nrow(df2))
df2$idCnt <- ave(df2$idCnt, df2$id, FUN=cumsum)
merge(df, df2, by=c("id", "idCnt"), all=TRUE)
id idCnt value.x value.y
1 A 1 X U
2 A 2 Y W
3 A 3 Z <NA>
4 B 1 Y U
5 B 2 <NA> W
6 B 3 <NA> V
7 C 1 X V
8 C 2 Z <NA>
Here is the data:
**data**
df <- read.table(header=T, text="id value
A X
A Y
A Z
B Y
C X
C Z", stringsAsFactors=F)
df2 <- read.table(header=T, text="id value
A U
A W
B U
B W
B V
C V", stringsAsFactors=F)
Upvotes: 2
Reputation: 1335
You want the 'all' argument in merge function.
merge(a, b, all = TRUE)
Upvotes: 0