Mukhtar Abdi
Mukhtar Abdi

Reputation: 411

How to change a certain columns in a list of data.frames to factor in r?

I have a list of data.frames, and want to change certain columns to a factor. The certain columns I want to change to a factor are c("station", "season"). I have tried various ways, but they did not work for me.

Any help please?

Here is a code for creating a data representing my dataset.

> df1 <- data.frame(station = c("MADA1", "MADA2", "MADA3", "MADA4", "MADA5"),
+                  rainfall = c(0, 5, 10, 15, 20),
+                  yield = c(2000, 3000, 4000, 5000, 6000),
+                  season = c('S1', 'S1', 'S2', 'S2', 'S1'))
> df2 <- df1
> df3 <- df1
> 
> list_1 <- list(df1, df2, df3)
> list_2 <- list(df1, df2, df3)
> mainlist <- list(list_1, list_2)
> 
> lapply(mainlist, head)
[[1]]
[[1]][[1]]
  station rainfall yield season
1   MADA1        0  2000     S1
2   MADA2        5  3000     S1
3   MADA3       10  4000     S2
4   MADA4       15  5000     S2
5   MADA5       20  6000     S1

[[1]][[2]]
  station rainfall yield season
1   MADA1        0  2000     S1
2   MADA2        5  3000     S1
3   MADA3       10  4000     S2
4   MADA4       15  5000     S2
5   MADA5       20  6000     S1

[[1]][[3]]
  station rainfall yield season
1   MADA1        0  2000     S1
2   MADA2        5  3000     S1
3   MADA3       10  4000     S2
4   MADA4       15  5000     S2
5   MADA5       20  6000     S1


[[2]]
[[2]][[1]]
  station rainfall yield season
1   MADA1        0  2000     S1
2   MADA2        5  3000     S1
3   MADA3       10  4000     S2
4   MADA4       15  5000     S2
5   MADA5       20  6000     S1

[[2]][[2]]
  station rainfall yield season
1   MADA1        0  2000     S1
2   MADA2        5  3000     S1
3   MADA3       10  4000     S2
4   MADA4       15  5000     S2
5   MADA5       20  6000     S1

[[2]][[3]]
  station rainfall yield season
1   MADA1        0  2000     S1
2   MADA2        5  3000     S1
3   MADA3       10  4000     S2
4   MADA4       15  5000     S2
5   MADA5       20  6000     S1


Upvotes: 2

Views: 122

Answers (3)

PaulS
PaulS

Reputation: 25323

A possible solution, based on rrapply::rrapply (recursive apply):

rrapply::rrapply(mainlist, condition = \(x, .xname) .xname %in%
       c("station", "season"), f = \(x) as.factor(x))

#> List of 2
#>  $ :List of 3
#>   ..$ :'data.frame': 5 obs. of  4 variables:
#>   .. ..$ station : Factor w/ 5 levels "MADA1","MADA2",..: 1 2 3 4 5
#>   .. ..$ rainfall: num [1:5] 0 5 10 15 20
#>   .. ..$ yield   : num [1:5] 2000 3000 4000 5000 6000
#>   .. ..$ season  : Factor w/ 2 levels "S1","S2": 1 1 2 2 1
#>   ..$ :'data.frame': 5 obs. of  4 variables:
#>   .. ..$ station : Factor w/ 5 levels "MADA1","MADA2",..: 1 2 3 4 5
#>   .. ..$ rainfall: num [1:5] 0 5 10 15 20
#>   .. ..$ yield   : num [1:5] 2000 3000 4000 5000 6000
#>   .. ..$ season  : Factor w/ 2 levels "S1","S2": 1 1 2 2 1
#> ...

Upvotes: 2

Martin Gal
Martin Gal

Reputation: 16978

You could use a nested lapply which is a bit cumbersome:

lapply(mainlist, 
       function(.x, .cols) { 
         lapply(.x, 
                function(.y) {
                  .y[.cols] <- lapply(.y[.cols], as.factor)
                  return(.y) 
                  }
                )
         }, 
       .cols = c("station", "season")
       )

Upvotes: 1

Julian
Julian

Reputation: 9240

A purrr approach (depth to get to the second list "layer"):

mainlist %>% map_depth(., 2,~.x %>% mutate(across(c("station", "season"), as.factor)))

Upvotes: 3

Related Questions