Reputation: 11697
I have a column in the dataframe rand_sample
that is a list of dataframes. I want to extract just the dataframe to perform computations within that dataframe, and then add those computations as new columns in rand_sample
str(rand_sample[1, ]$times)
List of 1
$ :'data.frame': 13 obs. of 2 variables:
..$ white: num [1:13] 1800 1834 1875 1897 1887 ...
..$ black: num [1:13] 1800 1860 1946 2031 2114 ...
First index looks like this:
> rand_sample[1:10,]$times
[[1]]
white black
1 1800 1800
2 1834 1860
3 1875 1946
4 1897 2031
5 1887 2114
6 1839 2203
7 1835 2282
8 1880 2370
9 1875 2400
10 1892 2323
11 1612 2356
12 1622 2370
13 1619 2370
Essentially, what I want to do can be expressed in this for
loop:
for (i in 1:nrow(rand_sample)) {
current <- rand_sample[i, ]$times[[1]]
mW <- abs(diff(current$white))
mB <- abs(diff(current$black))
maxWhite <- max(mW)
minWhite <- min(mW)
maxBlack <- max(mB)
minBlack <- min(mB)
sdWhite <- sd(mW)
sdBlack <- sd(mB)
avgW <- mean(mW)
avgB <- mean(mB)
rand_sample[i, ]$maxWhite <- maxWhite
rand_sample[i, ]$minWhite <- minWhite
rand_sample[i, ]$maxBlack <- maxBlack
rand_sample[i, ]$minBlack <- minBlack
rand_sample[i, ]$sdWhite <- sdWhite
rand_sample[i, ]$sdBlack <- sdBlack
rand_sample[i, ]$avgTimeWhite <- avgW
rand_sample[i, ]$avgTimeBlack <- avgB
}
Two questions:
How do I extract just the dataframe from each list in the $timestamp
?
rand_sample$times[[1]]
Gets me just the very first row. I want to be able to do something like
rand_samples$dataFrameTimes <- rand_sample$times[[1]]
So that that new column is just a column of dataframes, and not lists of one which contain a dataframe.
How do I emulate the for
loop via a faster mechanism? Running that for
loop takes about 1 second per row. I have a dataset containing thousands of rows, so this is untenable.
Upvotes: 1
Views: 119
Reputation: 107687
Consider turning for
loop into an lapply
for a list of dataframes (equal to rows of rand_sample. Then run do.call(rbind, ...)
on list into one single dataframe and finally cbind
to rand_sample. The transform
at the end is to remove the now unneeded times column:
dfList <- lapply(rand_sample$times, function(current) {
mW <- abs(diff(current[[1]]$white))
mB <- abs(diff(current[[1]]$black))
data.frame(
maxWhite = max(mW),
minWhite = min(mW),
maxBlack = max(mB),
minBlack = min(mB),
sdWhite = sd(mW),
sdBlack = sd(mB),
avgW = mean(mW),
avgB = mean(mB)
)
})
all_times <- do.call(rbind, dfList)
finaldf <- transform(cbind(rand_sample, all_times), times=NULL)
Sample Input
rand_sample <- data.frame(
ID = vapply(seq(50), function(i) sample(seq(15), 1, replace=TRUE), integer(1)),
GROUP = vapply(seq(50), function(i) sample(LETTERS, 1, replace=TRUE), character(1))
)
rand_sample$times <- lapply(1:50, function(i)
list(data.frame(white=sample(1000:2000, 50),
black=sample(1000:2000, 50))))
Output
head(finaldf)
# ID GROUP maxWhite minWhite maxBlack minBlack sdWhite sdBlack avgW avgB
# 1 3 N 807 3 778 32 212.5353 177.5051 327.4082 297.3469
# 2 12 Q 858 2 892 7 261.3543 222.4173 356.1837 366.7143
# 3 6 R 749 13 910 8 208.5439 233.3391 324.6735 348.2041
# 4 5 V 892 8 886 20 246.3769 261.3922 356.7347 329.5306
# 5 4 O 842 5 886 2 200.1235 257.9464 350.2653 300.7347
# 6 3 T 790 17 908 53 204.7842 235.0276 319.7959 385.1224
Upvotes: 1