Parseltongue
Parseltongue

Reputation: 11697

Extract dataframe from a list of dataframes, and perform computations

I have a column in the dataframe rand_sample that is a list of dataframes. I want to extract just the dataframe to perform computations within that dataframe, and then add those computations as new columns in rand_sample

str(rand_sample[1, ]$times)
List of 1
 $ :'data.frame':   13 obs. of  2 variables:
  ..$ white: num [1:13] 1800 1834 1875 1897 1887 ...
  ..$ black: num [1:13] 1800 1860 1946 2031 2114 ...

First index looks like this:

> rand_sample[1:10,]$times
[[1]]
   white black
1   1800  1800
2   1834  1860
3   1875  1946
4   1897  2031
5   1887  2114
6   1839  2203
7   1835  2282
8   1880  2370
9   1875  2400
10  1892  2323
11  1612  2356
12  1622  2370
13  1619  2370

Essentially, what I want to do can be expressed in this for loop:

for (i in 1:nrow(rand_sample)) {
  current <- rand_sample[i, ]$times[[1]]
  mW <- abs(diff(current$white))
  mB <- abs(diff(current$black))
  maxWhite <- max(mW)
  minWhite <- min(mW)
  maxBlack <- max(mB)
  minBlack <- min(mB)
  sdWhite <- sd(mW)
  sdBlack <- sd(mB)
  avgW <- mean(mW)
  avgB <- mean(mB)

  rand_sample[i, ]$maxWhite <- maxWhite
  rand_sample[i, ]$minWhite <- minWhite
  rand_sample[i, ]$maxBlack <- maxBlack
  rand_sample[i, ]$minBlack <- minBlack
  rand_sample[i, ]$sdWhite <- sdWhite
  rand_sample[i, ]$sdBlack <- sdBlack
  rand_sample[i, ]$avgTimeWhite <- avgW
  rand_sample[i, ]$avgTimeBlack <- avgB
}

Two questions:

  1. How do I extract just the dataframe from each list in the $timestamp?

    rand_sample$times[[1]]
    

    Gets me just the very first row. I want to be able to do something like

    rand_samples$dataFrameTimes <- rand_sample$times[[1]]
    

    So that that new column is just a column of dataframes, and not lists of one which contain a dataframe.

  2. How do I emulate the for loop via a faster mechanism? Running that for loop takes about 1 second per row. I have a dataset containing thousands of rows, so this is untenable.

Upvotes: 1

Views: 119

Answers (1)

Parfait
Parfait

Reputation: 107687

Consider turning for loop into an lapply for a list of dataframes (equal to rows of rand_sample. Then run do.call(rbind, ...) on list into one single dataframe and finally cbind to rand_sample. The transform at the end is to remove the now unneeded times column:

dfList <- lapply(rand_sample$times, function(current) {

  mW <- abs(diff(current[[1]]$white))
  mB <- abs(diff(current[[1]]$black))

  data.frame(
    maxWhite = max(mW),
    minWhite = min(mW),
    maxBlack = max(mB),
    minBlack = min(mB),
    sdWhite = sd(mW),
    sdBlack = sd(mB),
    avgW = mean(mW),
    avgB = mean(mB)
  )
})

all_times <- do.call(rbind, dfList)

finaldf <- transform(cbind(rand_sample, all_times), times=NULL)

Sample Input

rand_sample <- data.frame(
  ID = vapply(seq(50), function(i) sample(seq(15), 1, replace=TRUE), integer(1)),
  GROUP = vapply(seq(50), function(i) sample(LETTERS, 1, replace=TRUE), character(1))
)

rand_sample$times <- lapply(1:50, function(i) 
                            list(data.frame(white=sample(1000:2000, 50), 
                                            black=sample(1000:2000, 50))))

Output

head(finaldf)

#   ID GROUP maxWhite minWhite maxBlack minBlack  sdWhite  sdBlack     avgW     avgB
# 1  3     N      807        3      778       32 212.5353 177.5051 327.4082 297.3469
# 2 12     Q      858        2      892        7 261.3543 222.4173 356.1837 366.7143
# 3  6     R      749       13      910        8 208.5439 233.3391 324.6735 348.2041
# 4  5     V      892        8      886       20 246.3769 261.3922 356.7347 329.5306
# 5  4     O      842        5      886        2 200.1235 257.9464 350.2653 300.7347
# 6  3     T      790       17      908       53 204.7842 235.0276 319.7959 385.1224

Upvotes: 1

Related Questions