Phil
Phil

Reputation: 51

Create a new data frame column by picking a value in others columns according to an index column

Here is (a small part of) a data frame "df" with :

11 variables "v1" to "v11"

and an index column "indx" (with 1 <= indx <= 11).

"indx" was obtained through a previous step on another data frame and was then merged to "df" :

> df
    v1 v2  v3  v4  v5 v6  v7 v8 v9 v10 v11 indx
1  223  0  95 605  95  0   0  0  0 189   0   10
2   32  0   0  32   0 26   0  0  0  32   0    6
3    0  0 127  95  64 32   0  0  0 350   0   10
4  141  0 188   0 361  0   0  0  0 145   0    3
5   32  0 183   0 127  0   0  0  0 246   0    3
6   67  0 562   0   0  0   0  0  0 173   0    3
7   64  0 898   0   6  0   0  0  0   0   0    3
8    0  0  16   0  32  0   0  0  0  55   0   10
9    0  0 165   0   0  0 312  0  0 190   0   10
10   0  0 210   0   0  0 190  0  0  11   0    7

I need to build a new column "vsel" which value is "v(indx)"

(that is, for the 1rst row : vsel=189 because indx=10 and v10=189)

I successfully obtained this result by using a "for" loop :

> df
    v1 v2  v3  v4  v5 v6  v7 v8 v9 v10 v11 indx vsel
1  223  0  95 605  95  0   0  0  0 189   0   10  189
2   32  0   0  32   0 26   0  0  0  32   0    6   26
3    0  0 127  95  64 32   0  0  0 350   0   10  350
4  141  0 188   0 361  0   0  0  0 145   0    3  188
5   32  0 183   0 127  0   0  0  0 246   0    3  183
6   67  0 562   0   0  0   0  0  0 173   0    3  562
7   64  0 898   0   6  0   0  0  0   0   0    3  898
8    0  0  16   0  32  0   0  0  0  55   0   10   55
9    0  0 165   0   0  0 312  0  0 190   0   10  190
10   0  0 210   0   0  0 190  0  0  11   0    7  190

The code is :

df$vsel = NA
for (i in seq(1:nrow(df))   )
{
  r = df[i,]
  ind = r$indx
  df[i,"vsel"] = r[ind]
}

... I would like to avoid this loop (as it is rather slow when the data frame is big).

There is probably a (faster) R-type way :

maybe with apply(df, 1, ...) ?

or ddply ?

Thanks for any help …

Upvotes: 5

Views: 627

Answers (3)

Aaron - mostly inactive
Aaron - mostly inactive

Reputation: 37764

Matrix indexing to the rescue! R has a way of doing exactly what you are describing. It is simple and powerful but surprisingly little-known.

df$vsel <- df[cbind(1:nrow(df), df$indx)]

Upvotes: 6

Backlin
Backlin

Reputation: 14852

Here's a fully vectorized solution that is hard to beat in terms of speed.

df$vsel <- as.matrix(df)[1:nrow(df) + nrow(df)*(df$indx-1)]

This utilizes the fact that a matrix is internally stored as a long vector (column wise). The 1:nrow(df) will thereby specify row and nrow(df)*(df$indx-1) column. This does not work if you have mixed data types in df as everything would then be turned into strings by as.matrix.

Upvotes: 1

Pop
Pop

Reputation: 12411

You can do that :

f <- function(i){df[i,df[i,]$indx]}
temp <- sapply(FUN=f,X=1:length(df[,1]))
cbind(df,vsel=temp)

Upvotes: 1

Related Questions