bakas
bakas

Reputation: 323

Matching rownumber and column name of a data frame with values of another data frame

I have a sample data frame

samp_df <- data.frame(x1=c(1,3,5,7,9),x2=c(2,4,6,8,10))

> samp_df
  x1 x2
1  1  2
2  3  4
3  5  6
4  7  8
5  9 10

I have another data frame which contains variable str and sis_str

samp2_df <- data.frame(str=c(x1,x1,x2,x2,x1),sis_str=c(1,2,4,5,3))

> samp2_df
  str sis_str
1   x1       1
2   x1       2
3   x2       4
4   x2       5
5   x1       3

The objective is to create another variable "sim" in the samp2_df data frame which contains the value from samp_df,such that the variable sis_str should match the row name of the first data frame and variable str should match the column name of the first data frame

So the output should be

> samp2_df
  str sis_str  sim
1   1       1   1
2   1       2   3
3   2       4   8
4   2       5   10
5   1       3   5

Upvotes: 1

Views: 86

Answers (2)

IRTFM
IRTFM

Reputation: 263301

I think using a two-column matrix as an argument to "[" would be considerably faster if this were a problem of any size. See ?"[" ofor more information on this strategy:

samp_df[ cbind(samp2_df$sis_str, as.numeric(samp2_df$str)) ]
[1]  1  3  8 10  5

Then just cbind that to samp2_df:

cbind(samp2_df, sim=samp_df[ cbind(samp2_df$sis_str, as.numeric(samp2_df$str)) ] )
  str sis_str sim
1  x1       1   1
2  x1       2   3
3  x2       4   8
4  x2       5  10
5  x1       3   5

:Edit: If instead the task is to match to the rownames rather than the "numbers" (which I took to be the integer indices) then this would succeed:

cbind(samp2_df, sim2=samp_df[ cbind(match(samp2_df$sis_str,rownames(samp_df)),
                                    as.numeric(samp2_df$str)) ] )
  str sis_str sim sim2
1   1       1   1    1
2   1       2   3    3
3   2       4   8    8
4   2       5  10   NA
5   1      23  NA    9

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388807

I am not sure if there is a better way but one way to do this is by using mapply. We create a row-column pair value to be extracted form samp_df where the row value is samp2_df$sis_str and the column value is the numeric part in samp2_df$str which we get by substituting the character values to empty strings ("").

samp2_df$sim <- mapply(function(x, y) samp_df[x, y], 
               samp2_df$sis_str, as.numeric(sub("[a-zA-Z]+", "", samp2_df$str)))

samp2_df
#  str sis_str sim
#1  x1       1   1
#2  x1       2   3
#3  x2       4   8
#4  x2       5  10
#5  x1       3   5

Upvotes: 1

Related Questions