Mohamed Yusuf
Mohamed Yusuf

Reputation: 428

Categorise a variable in a list using the purrr::map_df() function

I have a list of datasets that I obtained from multiple imputation. I would like to now recategorise a variable within this list of datasets. I have tried using the map function from purrr, I have not had much luck with this as per the code below.

Is is possible to actually map a function that regorups and recodes a variable using purr?

# download pacman package if not installed, otherwise load it
if(!require(pacman)) install.packages(pacman)

# loads relevant packages using the pacman package
pacman::p_load(
  dplyr,       # for pipes and manipulation
  mice )       # for imputation

# make 10 dataset using mice

nhanes_imp <- parlmice(nhanes,
                       m = 10,
                       cluster.seed = 1234)

# mut imputed datasets into a list
nhanes_imp <- nhanes_imp$imp



# create function to categorise chl
chl_funct <- function(x) {
  
  if (x == "0") {
    "0 days"
  } else if (x < 100) {
    "< 100"
  } else if (x >= 100 & x < 150) {
    "100 - 149"
  } else if (x >= 150 & x < 200) {
    "150 - 199"
  } else if (x >= 200) {
    ">= 200"
  }



# use the new function to categorise the chl var

nhanes_imp %>% 
  map_df(.$chl,
         chl_funct)

When I run the code, this is the error that i get:

 <error/rlang_error>
  Can't convert a `data.frame` object to function
Backtrace:
 1. nhanes_imp %>% map_df(.$chl, chl_funct)
 2. purrr::map_df(., .$chl, chl_funct)
 4. purrr:::as_mapper.default(.f, ...)
 5. rlang::as_function(.f)
 6. rlang:::abort_coercion(x, friendly_type("function"))
  

Upvotes: 1

Views: 172

Answers (2)

akrun
akrun

Reputation: 887203

We can use cut

chl_funct <- function(x) {
      cut(x, breaks = c(-Inf, 0, 100, 150, 200, Inf), labels = c('0 days',
       "< 100", "100 - 149", "150 - 199", ">=200"))
}

Then use

library(dplyr)
nhanes_imp$chl <- nhanes_imp$chl %>%
      mutate(across(everything(), chl_funct))

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388992

First you should use a vectorised version in your function. This can be done with ifelse or case_when, if you have many more categories using cut would be better.

library(dplyr)

chl_funct <- function(x) {
  
  case_when(x == 0 ~ "0 days", 
            x < 100 ~ " < 100", 
            x >= 100 & x < 150 ~ "100 - 149", 
            x >= 150 & x < 200 ~ "150 - 199",
            TRUE ~ ">= 200")
}

You can then apply this function to every column of the dataset in nhanes_imp$chl.

nhanes_imp$chl <- nhanes_imp$chl %>% mutate(across(.fns = chl_funct))

Upvotes: 2

Related Questions