Reputation: 171

R: Combining data frames of different length without recycling values

I'm trying to merge two data frames with different dimensions, but just can't seem to get the result I'm after. Here's an example of what I'm trying to achieve:

Data frame A looks like this:

id        value
A         X
A         Y
A         Z
B         Y
C         X
C         Z

Data frame B looks like this:

id        value
A         U
A         W
B         U
B         W
B         V
C         V

And the data frame I'm trying to create looks like:

id        value.A        value.B
A         X              U
A         Y              W
A         Z              NA
B         Y              U
B         NA             W
B         NA             V
C         X              V
C         Z              NA

I've been trying to merge the data frames with the base merge command, but keep getting repeated values instead of NA's. It also seems to expand the resulting data frame to allow for unique combinations of value.A and value.B, which I'm also not interested in. Finally, not all ID's in data frame B have a matching ID in data frame A, in which case I would like value.B to be NA for that ID.

Is there a way to achieve what I'm looking for with merge or am I looking for another (set of) command(s)?

Edit:

Just to clarify, I did try the various combinations of joins, but still haven't found a solution.

Example:

A <- data.frame(id = c("A", "A", "A", "B", "C", "C"),
                value = c("X", "Y", "Z", "Y", "X", "Z"))

B <- data.frame(id = c("A", "A", "B", "B", "B", "C"),
                value = c("U", "W", "U", "W", "V", "V"))

merge(A, B, by="id", all=TRUE)

Produces:

id    value.x    value.y
A     X          U
A     X          W
A     Y          U
A     Y          W
A     Z          U
A     Z          W
B     Y          U
B     Y          W
B     Y          V
C     X          V
C     Z          V

That is to say, a much larger data frame than what I'm looking for as the values are repeated for every unique pair of values in data frame A and B.

Edit 2: Looks like I simplified my initial example too much, so just to clarify why the order of my values matters, I'll have to expand the original example a bit:

Data frame A looks like this:

id        value        rank
A         X            1
A         Y            0.5
A         Z            0.2
B         Y            1
C         X            1
C         Z            0.8

Data frame B looks like this:

id        value        rank
A         U            1
A         W            0.6
B         U            1
B         W            0.2
B         V            0.1
C         V            1

And the intended data frame looks like this:

id        value.A        rank.A        value.B        rank.B
A         X              1             U              1
A         Y              0.5           W              0.6
A         Z              0.2           NA             NA
B         Y              1             U              1
B         NA             NA            W              0.2
B         NA             NA            V              0.1
C         X              1             V              1
C         Z              0.8           NA             NA

Upvotes: 0

Answers (2)

lmo

Reputation: 38500

Here is my fancy cbind. I create secondary IDs that assume the data is ordered in with corresponding rows and df2 may end before df for each id:

# add secondary IDs
df$idCnt <- rep_len(1L, length=nrow(df))
df$idCnt <- ave(df$idCnt, df$id, FUN=cumsum)

df2$idCnt <- rep_len(1L, length=nrow(df2))
df2$idCnt <- ave(df2$idCnt, df2$id, FUN=cumsum)

merge(df, df2, by=c("id", "idCnt"), all=TRUE)
id idCnt value.x value.y
1  A     1       X       U
2  A     2       Y       W
3  A     3       Z    <NA>
4  B     1       Y       U
5  B     2    <NA>       W
6  B     3    <NA>       V
7  C     1       X       V
8  C     2       Z    <NA>

Here is the data:

**data**
df <- read.table(header=T, text="id        value
A         X
A         Y
A         Z
B         Y
C         X
C         Z", stringsAsFactors=F)

df2 <- read.table(header=T, text="id        value
A         U
A         W
B         U
B         W
B         V
C         V", stringsAsFactors=F)

Upvotes: 2

mkearney

Reputation: 1335

You want the 'all' argument in merge function.

merge(a, b, all = TRUE)

Upvotes: 0

R: Combining data frames of different length without recycling values

Answers (2)

Related Questions