colinf299
colinf299

Reputation: 19

Subsetting a list of data frames by condition

Sorry I can't embed pictures yet

I have 21 data frames in a list (listb), all with the same headings of Timestamp, Rainfall I would like to sort them by Rainfall (descending) and then subset the top 30 (to include the corresponding Timestamp) of each of the 21 data frames. Then put them back into a single dataframe with the name of the initial data frame as a heading?

Please find the list of data frames below, and a small cut from the b1 dataframe

Would I need to create a new dataframe for each of the new subsets then combine them into a list later?

Descending_b1 <- listb$b1[order(-Rainfall),]

b1_30 <- Descending_b1[1:30,1:2]

From that, I produce the following

b1_30 <- structure(list(Timestamp = c("25/1/2013", "24/1/2013", "2/2/2004", 
                             "21/3/2010", "16/7/2016", "1/2/2010", "26/1/2007", "29/12/1998", 
                             "24/2/2008", "5/2/2003", "6/2/2003", "11/11/2001", "3/12/2010", 
                             "8/3/2020", "27/12/2010", "29/1/1998", "18/10/2017", "13/3/2007", 
                             "5/4/2006", "10/6/2006", "19/11/2008", "20/2/2015", "26/3/2014", 
                             "15/3/2017", "27/8/2011", "1/3/2013", "27/8/1998", "11/2/2012", 
                             "11/2/2008", "26/1/2013"), 
                        Rainfall = c(238L, 158L, 131L, 131L,129L, 122L, 112L, 109L, 101L, 94L, 
                                     92L, 88L, 82L, 81L, 78L, 74L, 71L, 69L, 65L, 64L, 64L, 
                                     64L, 63L, 63L, 62L, 61L, 60L, 60L, 58L,57L)), 
                   row.names = c(5915L, 5914L, 2640L, 4874L, 7183L, 4826L, 3725L, 939L, 4118L, 2278L, 2279L, 1827L, 5131L, 8514L, 5155L, 
                              605L, 7642L, 3771L, 3429L, 3495L, 4387L, 6671L, 6340L, 7425L, 
                              5398L, 5950L, 815L, 5566L, 4105L, 5916L), class = "data.frame")
b1_30
#>       Timestamp Rainfall
#> 5915  25/1/2013      238
#> 5914  24/1/2013      158
#> 2640   2/2/2004      131
#> 4874  21/3/2010      131
#> 7183  16/7/2016      129
#> 4826   1/2/2010      122
#> 3725  26/1/2007      112
#> 939  29/12/1998      109
#> 4118  24/2/2008      101
#> 2278   5/2/2003       94
#> 2279   6/2/2003       92
#> 1827 11/11/2001       88
#> 5131  3/12/2010       82
#> 8514   8/3/2020       81
#> 5155 27/12/2010       78
#> 605   29/1/1998       74
#> 7642 18/10/2017       71
#> 3771  13/3/2007       69
#> 3429   5/4/2006       65
#> 3495  10/6/2006       64
#> 4387 19/11/2008       64
#> 6671  20/2/2015       64
#> 6340  26/3/2014       63
#> 7425  15/3/2017       63
#> 5398  27/8/2011       62
#> 5950   1/3/2013       61
#> 815   27/8/1998       60
#> 5566  11/2/2012       60
#> 4105  11/2/2008       58
#> 5916  26/1/2013       57

So yeah I hope to do that with the rest of the data frames within the list to create a new data frame whilst keeping the initial data frame name, and then combine them into a new list

Upvotes: 0

Views: 91

Answers (1)

AnilGoyal
AnilGoyal

Reputation: 26218

Suppose you have a list like this

set.seed(2021)
listb <- list(b1 = data.frame(Timestamp = as.Date("2010-01-01") + days(sample(1:100, 10)),
              Rainfall = sample(200:300, 10)),
b2 = data.frame(Timestamp = as.Date("2010-01-01") + days(sample(1:100, 10)),
                Rainfall = sample(200:300, 10)),
b3 = data.frame(Timestamp = as.Date("2010-01-01") + days(sample(1:100, 10)),
                Rainfall = sample(200:300, 10)))

> listb
$b1
    Timestamp Rainfall
1  2010-01-08      275
2  2010-02-08      250
3  2010-02-16      259
4  2010-02-28      217
5  2010-01-13      298
6  2010-03-12      202
7  2010-03-06      245
8  2010-04-10      225
9  2010-03-11      235
10 2010-01-24      285

$b2
    Timestamp Rainfall
1  2010-02-01      242
2  2010-04-09      258
3  2010-01-20      269
4  2010-03-10      285
5  2010-03-28      298
6  2010-01-06      262
7  2010-03-15      278
8  2010-03-05      233
9  2010-02-08      221
10 2010-01-19      215

$b3
    Timestamp Rainfall
1  2010-03-21      216
2  2010-03-30      240
3  2010-01-18      230
4  2010-01-21      272
5  2010-03-10      292
6  2010-04-05      226
7  2010-03-14      210
8  2010-03-25      235
9  2010-03-09      237
10 2010-01-03      278

Now you need to do this only (Needless to say replace n argument in slice_max with your desired n=30)

purrr::map2_dfr(listb, names(listb), ~ .x %>% 
          mutate(list_name = .y) %>%
           slice_max(Rainfall, n=5))

    Timestamp Rainfall list_name
1  2010-01-13      298        b1
2  2010-01-24      285        b1
3  2010-01-08      275        b1
4  2010-02-16      259        b1
5  2010-02-08      250        b1
6  2010-03-28      298        b2
7  2010-03-10      285        b2
8  2010-03-15      278        b2
9  2010-01-20      269        b2
10 2010-01-06      262        b2
11 2010-03-10      292        b3
12 2010-01-03      278        b3
13 2010-01-21      272        b3
14 2010-03-30      240        b3
15 2010-03-09      237        b3

If you want to return the output back into a similar list

purrr::map(listb,  ~ .x %>% 
               slice_max(Rainfall, n=5))

$b1
   Timestamp Rainfall
1 2010-01-13      298
2 2010-01-24      285
3 2010-01-08      275
4 2010-02-16      259
5 2010-02-08      250

$b2
   Timestamp Rainfall
1 2010-03-28      298
2 2010-03-10      285
3 2010-03-15      278
4 2010-01-20      269
5 2010-01-06      262

$b3
   Timestamp Rainfall
1 2010-03-10      292
2 2010-01-03      278
3 2010-01-21      272
4 2010-03-30      240
5 2010-03-09      237

Upvotes: 1

Related Questions