user3354212
user3354212

Reputation: 1112

Extract from a data frame column based on a list of indices

I have a data frame:

df = read.table(text="ID    location    C1  C2  C3  C4  C5  C6
M01 1   A   H   H   A   A   B
M02 2   A   H   A   A   A   B
M03 3   A   B   A   A   A   B
M04 4   H   B   H   A   A   B
M05 5   H   B   H   A   A   B
M06 6   A   B   H   A   A   H
M07 7   A   B   H   B   A   H
M08 8   A   B   H   A   A   H
M09 9   A   B   H   A   A   H
M10 10  B   B   H   A   A   H
M11 11  A   B   H   A   A   H
M12 12  A   B   H   A   A   H
M13 13  A   B   H   A   A   H
M14 14  B   B   B   A   A   H
M15 15  B   B   B   A   A   A", header=T, stringsAsFactors=F)

I would like to extract out the values of df$ID based on a list of index row numbers of df. The list a is:

a = list(C1 = c(3,   5,   9,   10,  13), C2 = c(2) , 
C3 = c(1,   3,   13 ), C4 =c(6,   7 ), C6 = c(5,   14 ))

The expected result is:

$C1
[1] "M03" "M05" "M09" "M10" "M13"

$C2
[1] "M02"

$C3
[1] "M01" "M03" "M13"

$C4
[1] "M06" "M07"

$C6
[1] "M05" "M14"

Upvotes: 2

Views: 2244

Answers (2)

akrun
akrun

Reputation: 887118

This can be easily done with lapply by looping over the list and extract the 'ID' based on the index in each of the list elements

lapply(a, function(x) df$ID[x])
#$C1
#[1] "M03" "M05" "M09" "M10" "M13"

#$C2
#[1] "M02"

#$C3
#[1] "M01" "M03" "M13"

#$C4
#[1] "M06" "M07"

#$C6
#[1] "M05" "M14"

Or we can use a compact option with Map which does the job

Map(`[`, list(df$ID), a)
#[[1]]
#[1] "M03" "M05" "M09" "M10" "M13"

#[[2]]
#[1] "M02"

#[[3]]
#[1] "M01" "M03" "M13"

#[[4]]
#[1] "M06" "M07"

#[[5]]
#[1] "M05" "M14"

nchar("Map(`[`, list(df$ID), a)")
#[1] 24

Benchmarks

Here, the benchmarks are based on a vector ('v1') with a list ('a1').

v1 <- paste0("M", 1:1e6)

If it is a data.frame column (v1 <- someDat$ID) to avoid repeated extractions.

set.seed(24)
a1 <- lapply(1:1e4, function(i) sample(1:1e6, sample(1e3), replace=FALSE))

system.time(relist(v1[unlist(a1, use.names = FALSE)], a1))
# user  system elapsed 
# 0.81    0.03    0.84 


system.time(lapply(a1, function(x) v1[x]))
# user  system elapsed 
#   0.36    0.00    0.36 

system.time(Map(`[`, list(v1), a1))
#  user  system elapsed 
#  0.35    0.00    0.34 

NOTE: Removed the {} (which we overlooked earlier), but still there is not much change in the benchmarks. As we stated earlier, it is better to create a vector object (v1 <- someDat$ID) and use it to check benchmarks instead of extracting the column everytime. In that respect, this benchmark serves the accurate purpose of benchmarking.

Upvotes: 3

Rich Scriven
Rich Scriven

Reputation: 99331

You could unlist the a list, index the data values, then relist it with itself as the skeleton.

relist(df$ID[unlist(a)], a)
# $C1
# [1] "M03" "M05" "M09" "M10" "M13"
#
# $C2
# [1] "M02"
#
# $C3
# [1] "M01" "M03" "M13"
#
# $C4
# [1] "M06" "M07"
#
# $C6
# [1] "M05" "M14"

Additionally, we could get an increase in speed if we drop the names in unlist.

relist(df$ID[unlist(a, use.names = FALSE)], a)

Note:

The benchmarks in the other answer are misleading. Here is a more accurate benchmark, showing the actual code from the other answer that uses $ extraction on every iteration and removing the unnecessary {} brackets around my expression ...

df <- data.frame(v1 = paste0("M", 1:1e6))
set.seed(24)
a1 <- lapply(1:1e4, function(i) sample(1:1e6, sample(1e3), replace=FALSE))

system.time(relist(df$v1[unlist(a1, use.names = FALSE)], a1))
#   user  system elapsed 
#  0.485   0.004   0.489 
system.time(lapply(a1, function(x) df$v1[x]))
#   user  system elapsed 
#   0.39    0.00    0.39 

Upvotes: 8

Related Questions