Frank Kravets
Frank Kravets

Reputation: 63

How do I use purrr::map with dataframe list to modify column values in specific dataframes without changing other dataframes in list?

# create 3 dataframes with identical column names
survey08 <- data.frame(year = 2008, employed = c(1, 2, 2, 1, 2))
survey09 <- data.frame(year = 2009, employed = c(1, 1, 1, 2, 1))
survey10 <- data.frame(year = 2010, employed = c(2, 1, 1, 1, 1))

# put dataframes into a list
df_list <- list(survey08, survey09, survey10)

# add names for dataframes in list
# names correspond to survey year ('year' column)
names(df_list) <- c("survey08", "survey09", "survey10")

I want to recode values in the employed column (1 = yes, 2 = no) but only in the survey08 and survey09 data frames. For other data frames in the list, I want to retain the original column values (i.e., only modify specific DFs in the list).

I tried the following code, using the year column as a filter:

library(tidyverse)

# modify only values in 'employed' column for DFs 'survey08' and 'survey09' 
# use 'year' column as filter

df_list %>% 
  map(~filter(.x, year %in% 2008:2009)) %>% 
  map(~ .x %>% mutate_at(vars(employed), ~recode_factor(.,`1` = "yes", `2` = "no")))

While this correctly recodes two data frames (survey08 and survey09), it doesn't retain values from other data frames in the list.

Current output:

#> $survey08
#>   year employed
#> 1 2008      yes
#> 2 2008       no
#> 3 2008       no
#> 4 2008      yes
#> 5 2008       no
#> 
#> $survey09
#>   year employed
#> 1 2009      yes
#> 2 2009      yes
#> 3 2009      yes
#> 4 2009       no
#> 5 2009      yes
#> 
#> $survey10
#> [1] year     employed
#> <0 rows> (or 0-length row.names)

Desired output:

$survey08
  year employed
1 2008      yes
2 2008       no
3 2008       no
4 2008      yes
5 2008       no

$survey09
  year employed
1 2009      yes
2 2009      yes
3 2009      yes
4 2009       no
5 2009      yes

$survey10
  year employed
1 2010      2
2 2010      1
3 2010      1
4 2010      1
5 2010      1

Created on 2019-08-24 by the reprex package (v0.3.0)

Upvotes: 2

Views: 1569

Answers (4)

www
www

Reputation: 39154

A base R solution using lapply and a user-defined function assessing if year is smaller than 2010.

df_list2 <- lapply(df_list, function(x){
  if (unique(x$year) < 2010){
    x$employed <- as.character(factor(x$employed, levels = c(1, 2), labels = c("yes", "no")))
  }
  return(x)
})

df_list2
# $survey08
#   year employed
# 1 2008      yes
# 2 2008       no
# 3 2008       no
# 4 2008      yes
# 5 2008       no
# 
# $survey09
#   year employed
# 1 2009      yes
# 2 2009      yes
# 3 2009      yes
# 4 2009       no
# 5 2009      yes
# 
# $survey10
#   year employed
# 1 2010        2
# 2 2010        1
# 3 2010        1
# 4 2010        1
# 5 2010        1

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388817

If you already know which list you want to perform the manipulation why not subset only those and recode them.

library(tidyverse)

df_list[c("survey08", "survey09")] <- df_list[c("survey08", "survey09")] %>%
  map(~ .x %>% mutate_at(vars(employed), ~recode_factor(.,`1` = "yes", `2` = "no")))


df_list
#$survey08
#  year employed
#1 2008      yes
#2 2008       no
#3 2008       no
#4 2008      yes
#5 2008       no

#$survey09
#  year employed
#1 2009      yes
#2 2009      yes
#3 2009      yes
#4 2009       no
#5 2009      yes

#$survey10
#  year employed
#1 2010        2
#2 2010        1
#3 2010        1
#4 2010        1
#5 2010        1

Upvotes: 0

Shree
Shree

Reputation: 11140

You can use purrr::map_at which only modifies elements given by names or positions.

df_list %>% 
  map_at(c("survey08", "survey09"),
         ~ filter(.x, year %in% 2008:2009)) %>% 
  map_at(c("survey08", "survey09"),
         ~ .x %>% mutate_at(vars(employed), 
         ~ recode_factor(.,`1` = "yes", `2` = "no")))

$`survey08`
  year employed
1 2008      yes
2 2008       no
3 2008       no
4 2008      yes
5 2008       no

$survey09
  year employed
1 2009      yes
2 2009      yes
3 2009      yes
4 2009       no
5 2009      yes

$survey10
  year employed
1 2010        2
2 2010        1
3 2010        1
4 2010        1
5 2010        1

Upvotes: 3

filups21
filups21

Reputation: 1907

using filter will remove the other data.frames that you want to keep. You want the map_if instead of map. Then you can use the .p to identify items to perform the map function on.

df_list %>% 
   map_if(., 
      .f = ~ .x %>% mutate_at(vars(employed), ~recode_factor(.,`1` = "yes", `2` = "no")), 
      .p = c(T,T,F))

or

df_list %>% 
   map_if(., 
       .f = ~ .x %>% mutate_at(vars(employed), ~recode_factor(.,`1` = "yes", `2` = "no")), 
       .p = ~ .x %>% pull(year) %>% unique(.) %in% 2008:2009)

Upvotes: 2

Related Questions