Extract from a data frame column based on a list of indices

Question

I have a data frame:

df = read.table(text="ID    location    C1  C2  C3  C4  C5  C6
M01 1   A   H   H   A   A   B
M02 2   A   H   A   A   A   B
M03 3   A   B   A   A   A   B
M04 4   H   B   H   A   A   B
M05 5   H   B   H   A   A   B
M06 6   A   B   H   A   A   H
M07 7   A   B   H   B   A   H
M08 8   A   B   H   A   A   H
M09 9   A   B   H   A   A   H
M10 10  B   B   H   A   A   H
M11 11  A   B   H   A   A   H
M12 12  A   B   H   A   A   H
M13 13  A   B   H   A   A   H
M14 14  B   B   B   A   A   H
M15 15  B   B   B   A   A   A", header=T, stringsAsFactors=F)

I would like to extract out the values of df$ID based on a list of index row numbers of df. The list a is:

a = list(C1 = c(3,   5,   9,   10,  13), C2 = c(2) , 
C3 = c(1,   3,   13 ), C4 =c(6,   7 ), C6 = c(5,   14 ))

The expected result is:

$C1
[1] "M03" "M05" "M09" "M10" "M13"

$C2
[1] "M02"

$C3
[1] "M01" "M03" "M13"

$C4
[1] "M06" "M07"

$C6
[1] "M05" "M14"

akrun · Accepted Answer

This can be easily done with lapply by looping over the list and extract the 'ID' based on the index in each of the list elements

lapply(a, function(x) df$ID[x])
#$C1
#[1] "M03" "M05" "M09" "M10" "M13"

#$C2
#[1] "M02"

#$C3
#[1] "M01" "M03" "M13"

#$C4
#[1] "M06" "M07"

#$C6
#[1] "M05" "M14"

Or we can use a compact option with Map which does the job

Map(`[`, list(df$ID), a)
#[[1]]
#[1] "M03" "M05" "M09" "M10" "M13"

#[[2]]
#[1] "M02"

#[[3]]
#[1] "M01" "M03" "M13"

#[[4]]
#[1] "M06" "M07"

#[[5]]
#[1] "M05" "M14"

nchar("Map(`[`, list(df$ID), a)")
#[1] 24

Benchmarks

Here, the benchmarks are based on a vector ('v1') with a list ('a1').

v1 <- paste0("M", 1:1e6)

If it is a data.frame column (v1 <- someDat$ID) to avoid repeated extractions.

set.seed(24)
a1 <- lapply(1:1e4, function(i) sample(1:1e6, sample(1e3), replace=FALSE))

system.time(relist(v1[unlist(a1, use.names = FALSE)], a1))
# user  system elapsed 
# 0.81    0.03    0.84 


system.time(lapply(a1, function(x) v1[x]))
# user  system elapsed 
#   0.36    0.00    0.36 

system.time(Map(`[`, list(v1), a1))
#  user  system elapsed 
#  0.35    0.00    0.34

NOTE: Removed the {} (which we overlooked earlier), but still there is not much change in the benchmarks. As we stated earlier, it is better to create a vector object (v1 <- someDat$ID) and use it to check benchmarks instead of extracting the column everytime. In that respect, this benchmark serves the accurate purpose of benchmarking.

Extract from a data frame column based on a list of indices

Answers (2)

Benchmarks

Related Questions