Reputation: 1112
I have a data frame:
df = read.table(text="ID location C1 C2 C3 C4 C5 C6
M01 1 A H H A A B
M02 2 A H A A A B
M03 3 A B A A A B
M04 4 H B H A A B
M05 5 H B H A A B
M06 6 A B H A A H
M07 7 A B H B A H
M08 8 A B H A A H
M09 9 A B H A A H
M10 10 B B H A A H
M11 11 A B H A A H
M12 12 A B H A A H
M13 13 A B H A A H
M14 14 B B B A A H
M15 15 B B B A A A", header=T, stringsAsFactors=F)
I would like to extract out the values of df$ID
based on a list of index row numbers of df
. The list a
is:
a = list(C1 = c(3, 5, 9, 10, 13), C2 = c(2) ,
C3 = c(1, 3, 13 ), C4 =c(6, 7 ), C6 = c(5, 14 ))
The expected result is:
$C1
[1] "M03" "M05" "M09" "M10" "M13"
$C2
[1] "M02"
$C3
[1] "M01" "M03" "M13"
$C4
[1] "M06" "M07"
$C6
[1] "M05" "M14"
Upvotes: 2
Views: 2244
Reputation: 887118
This can be easily done with lapply
by looping over the list
and extract the 'ID' based on the index in each of the list
elements
lapply(a, function(x) df$ID[x])
#$C1
#[1] "M03" "M05" "M09" "M10" "M13"
#$C2
#[1] "M02"
#$C3
#[1] "M01" "M03" "M13"
#$C4
#[1] "M06" "M07"
#$C6
#[1] "M05" "M14"
Or we can use a compact option with Map
which does the job
Map(`[`, list(df$ID), a)
#[[1]]
#[1] "M03" "M05" "M09" "M10" "M13"
#[[2]]
#[1] "M02"
#[[3]]
#[1] "M01" "M03" "M13"
#[[4]]
#[1] "M06" "M07"
#[[5]]
#[1] "M05" "M14"
nchar("Map(`[`, list(df$ID), a)")
#[1] 24
Here, the benchmarks are based on a vector
('v1') with a list
('a1').
v1 <- paste0("M", 1:1e6)
If it is a data.frame
column (v1 <- someDat$ID
) to avoid repeated extractions.
set.seed(24)
a1 <- lapply(1:1e4, function(i) sample(1:1e6, sample(1e3), replace=FALSE))
system.time(relist(v1[unlist(a1, use.names = FALSE)], a1))
# user system elapsed
# 0.81 0.03 0.84
system.time(lapply(a1, function(x) v1[x]))
# user system elapsed
# 0.36 0.00 0.36
system.time(Map(`[`, list(v1), a1))
# user system elapsed
# 0.35 0.00 0.34
NOTE: Removed the {}
(which we overlooked earlier), but still there is not much change in the benchmarks. As we stated earlier, it is better to create a vector object (v1 <- someDat$ID
) and use it to check benchmarks instead of extracting the column everytime. In that respect, this benchmark serves the accurate purpose of benchmarking.
Upvotes: 3
Reputation: 99331
You could unlist the a
list, index the data values, then relist
it with itself as the skeleton.
relist(df$ID[unlist(a)], a)
# $C1
# [1] "M03" "M05" "M09" "M10" "M13"
#
# $C2
# [1] "M02"
#
# $C3
# [1] "M01" "M03" "M13"
#
# $C4
# [1] "M06" "M07"
#
# $C6
# [1] "M05" "M14"
Additionally, we could get an increase in speed if we drop the names in unlist
.
relist(df$ID[unlist(a, use.names = FALSE)], a)
Note:
The benchmarks in the other answer are misleading. Here is a more accurate benchmark, showing the actual code from the other answer that uses $
extraction on every iteration and removing the unnecessary {}
brackets around my expression ...
df <- data.frame(v1 = paste0("M", 1:1e6))
set.seed(24)
a1 <- lapply(1:1e4, function(i) sample(1:1e6, sample(1e3), replace=FALSE))
system.time(relist(df$v1[unlist(a1, use.names = FALSE)], a1))
# user system elapsed
# 0.485 0.004 0.489
system.time(lapply(a1, function(x) df$v1[x]))
# user system elapsed
# 0.39 0.00 0.39
Upvotes: 8