Tomas Greif
Tomas Greif

Reputation: 22623

Join two dataframes where column names are in rows

I do have the following structure of data:

x <- read.table(header=T, text="
variable class value
a a1 1
a a2 2
a a3 3
b b1 4
b b2 5
b b3 6
c c1 7
c c2 8
c a3 9")

y <- read.table(header=T, text="
a b c
a1 b2 c2
a2 b1 c1
a3 b3 a3"
)

Now I need to add three variables to df y - out_a, out_b, out_c where I need to map values in x$value to df y based on column name and class. The output should look like the following:

a b c a_out b_out c_out
a1 b2 c3 1 5 8
a2 b1 c1 2 4 7
a3 b3 c2 3 6 9

I can use sqldf to do this:

sqldf("select y.*, x1.value as a_out , x2.value as b_out, x3.value as c_out
        from 
          y
          join x as x1 on (x1.class=y.a and x1.variable='a')
          join x as x2 on (x2.class=y.b and x2.variable='b')
          join x as x3 on (x3.class=y.c and x3.variable='c')
      ")

In real world, I have many columns (50+) and therefore I am looking for something more elegant.

Upvotes: 1

Views: 95

Answers (2)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193497

Here's another approach:

## Convert "y" to a long data.frame
y2 <- stack(lapply(y, as.character))

## Reorder "x" according to "y2"
x2 <- x[match(do.call(paste, x[1:2]), do.call(paste, rev(y2))), ]

## Use ave to generate an "id" variable
x2$id <- ave(x2$variable, x2$variable, FUN = seq_along)

## "x2" now looks like this
x2
#   variable class value id
# 1        a    a1     1  1
# 2        a    a2     2  2
# 3        a    a3     3  3
# 5        b    b2     5  1
# 4        b    b1     4  2
# 6        b    b3     6  3
# 8        c    c2     8  1
# 7        c    c1     7  2
# 9        c    a3     9  3

## Use reshape to get your data in the wide format that you are looking for
reshape(x2, direction = "wide", idvar = "id", timevar = "variable")
#   id class.a value.a class.b value.b class.c value.c
# 1  1      a1       1      b2       5      c2       8
# 2  2      a2       2      b1       4      c1       7
# 3  3      a3       3      b3       6      a3       9

From there, it is pretty much cosmetic work.... Using some sub/gsub to rename the columns, and reordering them if necessary.

Upvotes: 2

David
David

Reputation: 9405

I'm sure there is a more elegant way to do this and I'm not 100% that I understand what you're trying to do, but I think this should do the trick:

for(col in names(y)){
  tmp <- x[x$variable == col,c("class","value")]
  y[,paste0(col,"_out")] <- tmp$value[match(as.character(y[,col]),as.character(tmp$class))]
}

   a  b  c a_out b_out c_out
1 a1 b2 c2     1     5     8
2 a2 b1 c1     2     4     7
3 a3 b3 a3     3     6     9

Upvotes: 2

Related Questions