Reputation: 51
Here is (a small part of) a data frame "df" with :
11 variables "v1" to "v11"
and an index column "indx" (with 1 <= indx <= 11).
"indx" was obtained through a previous step on another data frame and was then merged to "df" :
> df
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 indx
1 223 0 95 605 95 0 0 0 0 189 0 10
2 32 0 0 32 0 26 0 0 0 32 0 6
3 0 0 127 95 64 32 0 0 0 350 0 10
4 141 0 188 0 361 0 0 0 0 145 0 3
5 32 0 183 0 127 0 0 0 0 246 0 3
6 67 0 562 0 0 0 0 0 0 173 0 3
7 64 0 898 0 6 0 0 0 0 0 0 3
8 0 0 16 0 32 0 0 0 0 55 0 10
9 0 0 165 0 0 0 312 0 0 190 0 10
10 0 0 210 0 0 0 190 0 0 11 0 7
I need to build a new column "vsel" which value is "v(indx)"
(that is, for the 1rst row : vsel=189 because indx=10 and v10=189)
I successfully obtained this result by using a "for" loop :
> df
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 indx vsel
1 223 0 95 605 95 0 0 0 0 189 0 10 189
2 32 0 0 32 0 26 0 0 0 32 0 6 26
3 0 0 127 95 64 32 0 0 0 350 0 10 350
4 141 0 188 0 361 0 0 0 0 145 0 3 188
5 32 0 183 0 127 0 0 0 0 246 0 3 183
6 67 0 562 0 0 0 0 0 0 173 0 3 562
7 64 0 898 0 6 0 0 0 0 0 0 3 898
8 0 0 16 0 32 0 0 0 0 55 0 10 55
9 0 0 165 0 0 0 312 0 0 190 0 10 190
10 0 0 210 0 0 0 190 0 0 11 0 7 190
The code is :
df$vsel = NA
for (i in seq(1:nrow(df)) )
{
r = df[i,]
ind = r$indx
df[i,"vsel"] = r[ind]
}
... I would like to avoid this loop (as it is rather slow when the data frame is big).
There is probably a (faster) R-type way :
maybe with apply(df, 1, ...) ?
or ddply ?
Thanks for any help …
Upvotes: 5
Views: 627
Reputation: 37764
Matrix indexing to the rescue! R has a way of doing exactly what you are describing. It is simple and powerful but surprisingly little-known.
df$vsel <- df[cbind(1:nrow(df), df$indx)]
Upvotes: 6
Reputation: 14852
Here's a fully vectorized solution that is hard to beat in terms of speed.
df$vsel <- as.matrix(df)[1:nrow(df) + nrow(df)*(df$indx-1)]
This utilizes the fact that a matrix is internally stored as a long vector (column wise). The 1:nrow(df)
will thereby specify row and nrow(df)*(df$indx-1)
column. This does not work if you have mixed data types in df
as everything would then be turned into strings by as.matrix
.
Upvotes: 1
Reputation: 12411
You can do that :
f <- function(i){df[i,df[i,]$indx]}
temp <- sapply(FUN=f,X=1:length(df[,1]))
cbind(df,vsel=temp)
Upvotes: 1