Shaxi Liver
Shaxi Liver

Reputation: 1120

Extract different set of rows from data frames in list

It is going to be hard to give you a reproducible example but in general it supposed to be an easy task for many of you. My brain has not yet switched on after midday coffee.

I have a list of 20-30 data frames. I would like to extract specific rows from each data frame. The pattern will be very repetetive.

From first data frame, lets call it LD1 I would like to take rows 1:8 and from every next data frame the row numbers will be higher by 8, so 9:16, third - 17:24, etc.

I would like to keep original names of these data frames.

Can someone switch a light in my brain ?

Upvotes: 1

Views: 110

Answers (5)

djbetancourt
djbetancourt

Reputation: 349

Minimal reproducible example:

# works also if you have matrices instead of data frames
genDF.a <- genDF.b <- genDF.c <- data.frame(matrix(rep(1:100, 2), nrow = 100))
myList <- list(a = genDF.a, b = genDF.b, c = genDF.c)

Now the answer to your question:

# put the indices of the rows you want to extract on a list
myInds <- lapply(0:2, function(i) (1:8)+(8*(i)))

# use mapply to loop over both, the list of matrices and the list of indices
mapply(function(M, ind) M[ind,], myList, myInds, SIMPLIFY = TRUE)

Edit based on the comment of @Sotos

# use Map to loop over both, the list of matrices and the list of indices
Map(function(M, ind) M[ind,], myList, myInds)

You will obtain a list with the desired rows of each matrix along with the names from the original list.

I filled the rows of the data frames with their corresponding index so that it is easy to check it works.

The output:

$a
  X1 X2
1  1  1
2  2  2
3  3  3
4  4  4
5  5  5
6  6  6
7  7  7
8  8  8

$b
   X1 X2
9   9  9
10 10 10
11 11 11
12 12 12
13 13 13
14 14 14
15 15 15
16 16 16

$c
   X1 X2
17 17 17
18 18 18
19 19 19
20 20 20
21 21 21
22 22 22
23 23 23
24 24 24

Upvotes: 1

tmfmnk
tmfmnk

Reputation: 39858

One option involving purrr and dplyr could be:

map2(.x = lst,
     .y = split(1:nrow(lst[[1]]), 
                cut(1:nrow(lst[[1]]), c(0, cumsum(rep(5, length(lst)-1)), Inf))),
     ~ .x %>%
      filter(row_number() %in% .y))

Here, the number of rows is the following:

$df1
[1] 5

$df2
[1] 5

$df3
[1] 5

$df4
[1] 17

Could be a slightly more compact by:

df_nrow <- 1:nrow(lst[[1]])
n <- 5

map2(.x = lst,
     .y = split(df_nrow, 
                cut(df_nrow, c(0, cumsum(rep(n, length(lst)-1)), Inf))),
     ~ .x %>%
      filter(row_number() %in% .y))

Sample data:

lst <- list(df1 = mtcars,
            df2 = mtcars,
            df3 = mtcars,
            df4 = mtcars)

Upvotes: 1

user2974951
user2974951

Reputation: 10375

Using mapply

df=list(a=mtcars,b=mtcars,c=mtcars)
ix=list(1:8,9:16,17:25)
mapply(function(x,y){list(x[y,])},x=df,y=ix)

Upvotes: 0

Sotos
Sotos

Reputation: 51592

One idea is to use Map and create the indices using a simple mathematical formula which will work for any number of data frames in your list, i.e.

Map(function(x, y)x[seq(8) + y * 8,, drop = FALSE], l2, 0:(length(l2) - 1))

which gives,

$v1
   v1
1 444
2  52
3 345
4  48
5 375
6 491
7  10
8 126

$v1
    v1
9   57
10 354
11 239
12 205
13 273
14 172
15 345
16 293

$v1
    v1
17 366
18 487
19 423
20 194
21  18
22 476
23 151
24 382

$v1
    v1
25 131
26 245
27  10
28  41
29 248
30 104
31 163
32 187

$v1
    v1
33 335
34  44
35 442
36 362
37 470
38 145
39 384
40 257

where l2,

dput(l2)
list(v1 = structure(list(v1 = c(444L, 52L, 345L, 48L, 375L, 491L, 
10L, 126L, 231L, 124L, 494L, 476L, 213L, 208L, 35L, 327L, 294L, 
467L, 39L, 295L, 12L, 49L, 201L, 335L, 72L, 204L, 453L, 299L, 
157L, 355L, 380L, 348L, 309L, 117L, 404L, 304L, 222L, 287L, 500L, 
406L, 340L, 166L, 442L, 256L, 354L, 269L, 98L, 245L, 471L, 253L, 
15L, 130L, 434L, 329L, 465L, 18L, 346L, 389L, 185L, 238L)), row.names = c(NA, 
-60L), class = "data.frame"), v1 = structure(list(v1 = c(67L, 
461L, 68L, 420L, 59L, 291L, 391L, 275L, 57L, 354L, 239L, 205L, 
273L, 172L, 345L, 293L, 236L, 304L, 70L, 410L, 91L, 204L, 343L, 
386L, 400L, 482L, 221L, 190L, 340L, 328L, 367L, 36L, 95L, 229L, 
98L, 148L, 255L, 490L, 101L, 480L, 113L, 122L, 330L, 31L, 276L, 
18L, 192L, 243L, 178L, 240L, 297L, 75L, 381L, 144L, 71L, 208L, 
76L, 46L, 146L, 373L)), row.names = c(NA, -60L), class = "data.frame"), 
    v1 = structure(list(v1 = c(344L, 200L, 282L, 236L, 404L, 
    201L, 286L, 185L, 479L, 46L, 32L, 124L, 365L, 297L, 66L, 
    483L, 366L, 487L, 423L, 194L, 18L, 476L, 151L, 382L, 240L, 
    261L, 346L, 345L, 85L, 332L, 179L, 67L, 87L, 415L, 98L, 480L, 
    320L, 307L, 141L, 224L, 27L, 432L, 103L, 23L, 370L, 306L, 
    153L, 78L, 418L, 186L, 459L, 162L, 59L, 484L, 20L, 385L, 
    216L, 116L, 99L, 301L)), row.names = c(NA, -60L), class = "data.frame"), 
    v1 = structure(list(v1 = c(358L, 233L, 343L, 121L, 22L, 230L, 
    461L, 430L, 246L, 19L, 155L, 303L, 197L, 276L, 44L, 264L, 
    102L, 243L, 153L, 385L, 89L, 49L, 360L, 148L, 131L, 245L, 
    10L, 41L, 248L, 104L, 163L, 187L, 5L, 179L, 341L, 322L, 250L, 
    210L, 223L, 103L, 80L, 151L, 263L, 310L, 34L, 275L, 165L, 
    328L, 71L, 364L, 454L, 336L, 249L, 205L, 284L, 419L, 113L, 
    185L, 416L, 298L)), row.names = c(NA, -60L), class = "data.frame"), 
    v1 = structure(list(v1 = c(393L, 346L, 227L, 242L, 61L, 264L, 
    106L, 326L, 278L, 150L, 397L, 398L, 199L, 478L, 430L, 134L, 
    297L, 291L, 341L, 436L, 47L, 94L, 275L, 419L, 448L, 180L, 
    24L, 440L, 135L, 260L, 472L, 158L, 335L, 44L, 442L, 362L, 
    470L, 145L, 384L, 257L, 6L, 333L, 429L, 149L, 62L, 173L, 
    109L, 330L, 492L, 286L, 328L, 178L, 197L, 367L, 282L, 426L, 
    466L, 111L, 123L, 251L)), row.names = c(NA, -60L), class = "data.frame"))

Upvotes: 3

Patrick
Patrick

Reputation: 168

You can use lapply and modify it to your needs, a working example

# create some sample data
sample_list <- lapply(1:30, function(i) {

  tibble::tibble(x = i * 1:1000, y = 2 * x)


})

# number of rows to extract/skip
skip_no <- 8

# use lapply with anonymus function
lapply(1:length(sample_list), function(i) {

  # create own variable to set the sample_list index in relation to 
  # the anonymus function argument 'i'

  if (i == 1) {

    current_index <- 1

  } else {

  current_index <- (i - 1) * skip_no + 1

  }


  sample_list[[i]][current_index:(current_index + skip_no - 1),]



})


Upvotes: 0

Related Questions