Reputation: 323
I have a sample data frame
samp_df <- data.frame(x1=c(1,3,5,7,9),x2=c(2,4,6,8,10))
> samp_df
x1 x2
1 1 2
2 3 4
3 5 6
4 7 8
5 9 10
I have another data frame which contains variable str and sis_str
samp2_df <- data.frame(str=c(x1,x1,x2,x2,x1),sis_str=c(1,2,4,5,3))
> samp2_df
str sis_str
1 x1 1
2 x1 2
3 x2 4
4 x2 5
5 x1 3
The objective is to create another variable "sim" in the samp2_df data frame which contains the value from samp_df,such that the variable sis_str should match the row name of the first data frame and variable str should match the column name of the first data frame
So the output should be
> samp2_df
str sis_str sim
1 1 1 1
2 1 2 3
3 2 4 8
4 2 5 10
5 1 3 5
Upvotes: 1
Views: 86
Reputation: 263301
I think using a two-column matrix as an argument to "[" would be considerably faster if this were a problem of any size. See ?"["
ofor more information on this strategy:
samp_df[ cbind(samp2_df$sis_str, as.numeric(samp2_df$str)) ]
[1] 1 3 8 10 5
Then just cbind
that to samp2_df:
cbind(samp2_df, sim=samp_df[ cbind(samp2_df$sis_str, as.numeric(samp2_df$str)) ] )
str sis_str sim
1 x1 1 1
2 x1 2 3
3 x2 4 8
4 x2 5 10
5 x1 3 5
:Edit: If instead the task is to match to the rownames rather than the "numbers" (which I took to be the integer indices) then this would succeed:
cbind(samp2_df, sim2=samp_df[ cbind(match(samp2_df$sis_str,rownames(samp_df)),
as.numeric(samp2_df$str)) ] )
str sis_str sim sim2
1 1 1 1 1
2 1 2 3 3
3 2 4 8 8
4 2 5 10 NA
5 1 23 NA 9
Upvotes: 2
Reputation: 388807
I am not sure if there is a better way but one way to do this is by using mapply
. We create a row-column pair value to be extracted form samp_df
where the row value is samp2_df$sis_str
and the column value is the numeric part in samp2_df$str
which we get by substituting the character values to empty strings (""
).
samp2_df$sim <- mapply(function(x, y) samp_df[x, y],
samp2_df$sis_str, as.numeric(sub("[a-zA-Z]+", "", samp2_df$str)))
samp2_df
# str sis_str sim
#1 x1 1 1
#2 x1 2 3
#3 x2 4 8
#4 x2 5 10
#5 x1 3 5
Upvotes: 1