Thijs
Thijs

Reputation: 189

"object not found error" for columns added to list column tibble with purrr pmap

I've got a dataset to which I want to apply the following operations:

  1. Nest data by a set of columns
  2. Modify the 'data' list column to a wide format
  3. Use both the 'data' list column and values in the grouping columns to create a new column in the 'data' list column, using the columns created under 2. as inputs as well.

1 & 2 work fine using purrr and dplyr, but in step 3 when I try to reference one of the columns created in step 2 it gives me an 'object not found' error. It works fine if I reference only columns in the 'data' list column that already existed before step 2. If I inspect the contents of the list column after step 2 everything referenced in step 3 is there so why is the newly created column not recognized?

Reprex

library(tidyverse)

mtcarsT <- mtcars %>% as_tibble() %>%
  group_by(cyl, gear, vs) %>%
  mutate(cp_flag = rep(c('C', 'P'), length.out = n())) %>%
# step 1, works fine.
  nest() %>%
  mutate(data = map(data, ~ .x %>%
                      group_by(mpg, disp, hp, drat, am, carb) %>%
# step 2, also works fine, generates new columns 'wt_C', 'wt_P', 'qsec_C', 'qsec_P' in 'data'
                      pivot_wider(names_from = cp_flag, values_from = wt:qsec) %>%
                      ungroup()))

# > mtcarsT[1,]
# A tibble: 1 x 4
# Groups:   cyl, vs, gear [1]
#    cyl    vs  gear data             
#  <dbl> <dbl> <dbl> <list>           
# 1     6     0     4 <tibble [1 x 10]>
#
# > mtcarsT$data[[1]]
# A tibble: 1 x 10
#    mpg  disp    hp  drat    am  carb  wt_C  wt_P qsec_C qsec_P
#  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>
#  1    21   160   110   3.9     1     4  2.62  2.88   16.5   17.0


# step 3A: this one works fine when only referencing columns in 'data' that already existed before step 2.
mtcarsT %>%
  mutate(data = pmap(.l = list(a = data, b = vs, c = gear),
                     .f = function(a, b, c) a %>% 
                       dplyr::mutate(vs_gear = carb - b + c)))  

# > .Last.value$data[[1]]
# A tibble: 1 x 11
#    mpg  disp    hp  drat    am  carb  wt_C  wt_P qsec_C qsec_P vs_gear
#  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>   <dbl>
# 1    21   160   110   3.9     1     4  2.62  2.88   16.5   17.0       8


# Step 3B: this is what I want to do, use the column 'wt_P' in the 'data' list column 
# that was created in step 2 along with other columns 'vs' and 'gear' in the nested tibble, 
# but it throws the error 'object wt_P not found'
mtcarsT %>%
  mutate(data = pmap(.l = list(a = data, b = vs, c = gear),
                     .f = function(a, b, c) a %>% 
                       dplyr::mutate(vs_gear = wt_P - b + c)))

# Error: object 'wt_P' not found
# Called from: mutate_impl(.data, dots, caller_env())


'''

I'm using R 3.6.2 x64 with tidyverse 1.3.0 inside RStudio 1.2.5033 on Windows 10.




Upvotes: 0

Views: 1042

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389145

The example works as expected but the issue is present in step 1 itself.

To show that the example works you can change wt_P to wt_C and it works.

library(tidyverse)

mtcarsT %>%
   mutate(data = pmap(.l = list(a = data, b = vs, c = gear),
                      .f = function(a, b, c)
                             a %>% dplyr::mutate(vs_gear = wt_C - b + c)))

#     cyl    vs  gear data              
#   <dbl> <dbl> <dbl> <list>            
# 1     6     0     4 <tibble [1 × 11]> 
# 2     4     1     4 <tibble [8 × 11]> 
# 3     6     1     3 <tibble [2 × 11]> 
#....

The issue in step 1 is when you are doing

mtcars %>% as_tibble() %>%
    group_by(cyl, gear, vs) %>%
    mutate(cp_flag = rep(c('C', 'P'), length.out = n()))

there are certain groups with only 1 observation who do not get P value at all.

mtcars %>%  count(cyl, gear, vs)

# A tibble: 10 x 4
#     cyl  gear    vs     n
#   <dbl> <dbl> <dbl> <int>
# 1     4     3     1     1
# 2     4     4     1     8
# 3     4     5     0     1
# 4     4     5     1     1
# 5     6     3     1     2
# 6     6     4     0     2
# 7     6     4     1     2
# 8     6     5     0     1
# 9     8     3     0    12
#10     8     5     0     2 

Hence, wt_P is not calculated for them and it returns an error whereas wt_C does not. If you change order in rep from c('C', 'P') to c('P', 'C') then you will get error for wt_C and wt_P would work as expected.


To add the missing column, we can do :

mtcars %>%
   group_by(cyl, gear, vs) %>% 
   mutate(cp_flag = rep(c('C', 'P'), length.out = n())) %>% 
   nest() %>% 
   mutate(data = map(data, ~{ 
                       temp <- .x %>% 
                       group_by(mpg, disp, hp, drat, am, carb) %>% 
                       pivot_wider(names_from = cp_flag, values_from = wt:qsec, 
                       values_fill = list(wt = NA, qsec = NA)) %>% 
                       ungroup()
                       temp[setdiff(cols, names(temp))] <- NA;temp
         })) 


#     cyl    vs  gear data              
#   <dbl> <dbl> <dbl> <list>            
# 1     6     0     4 <tibble [1 × 10]> 
# 2     4     1     4 <tibble [8 × 10]> 
# 3     6     1     3 <tibble [2 × 10]> 
# 4     8     0     3 <tibble [12 × 10]>
# 5     6     1     4 <tibble [2 × 10]> 
# 6     4     1     3 <tibble [1 × 10]> 
# 7     4     0     5 <tibble [1 × 10]> 
# 8     4     1     5 <tibble [1 × 10]> 
# 9     8     0     5 <tibble [2 × 10]> 
#10     6     0     5 <tibble [1 × 10]> 

So all of them have same number of columns.

Upvotes: 1

Related Questions