M. Wood
M. Wood

Reputation: 587

purrr: error when turning a nested list to a character vector

I have a dataset with some duplicate entries that I want to change to include only unique combinations of values, with a dup_num column to indicate the number of duplicate entries, and a dup_rows column to indicate which rows contain duplicate data.

I implemented a solution based on Finding duplicate observations of selected variables in a tibble , but it throws a mess of warnings when coercing data in the column containing the list of row numbers to a character vector. Not a problem now, but I want to show this data with DT and Shiny and the warnings are a problem for this application.

library(tidyverse)

df <- tibble(episode = 1:30,
             day = rep(c("Mon", "Wed", "Fri"), 10),
             name = rep(c(
               "Moe", "Larry", "Curly", "Shemp", "extra"
             ), 6))

chr_dups <- as_mapper( ~ str_c(.x) %>%
                         str_remove_all("[c\\(\\)]"))

df %>%
  nest(episode, .key = "dups") %>%
  mutate(dup_num = map_dbl(dups, nrow),
         dup_rows = map_chr(dups, chr_dups))
#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing
#> # A tibble: 15 x 5
#>    day   name  dups             dup_num dup_rows
#>    <chr> <chr> <list>             <dbl> <chr>   
#>  1 Mon   Moe   <tibble [2 x 1]>       2 1, 16   
#>  2 Wed   Larry <tibble [2 x 1]>       2 2, 17   
#>  3 Fri   Curly <tibble [2 x 1]>       2 3, 18   
#>  4 Mon   Shemp <tibble [2 x 1]>       2 4, 19   
#>  5 Wed   extra <tibble [2 x 1]>       2 5, 20   
#>  6 Fri   Moe   <tibble [2 x 1]>       2 6, 21   
#>  7 Mon   Larry <tibble [2 x 1]>       2 7, 22   
#>  8 Wed   Curly <tibble [2 x 1]>       2 8, 23   
#>  9 Fri   Shemp <tibble [2 x 1]>       2 9, 24   
#> 10 Mon   extra <tibble [2 x 1]>       2 10, 25  
#> 11 Wed   Moe   <tibble [2 x 1]>       2 11, 26  
#> 12 Fri   Larry <tibble [2 x 1]>       2 12, 27  
#> 13 Mon   Curly <tibble [2 x 1]>       2 13, 28  
#> 14 Wed   Shemp <tibble [2 x 1]>       2 14, 29  
#> 15 Fri   extra <tibble [2 x 1]>       2 15, 30

Created on 2019-09-19 by the reprex package (v0.3.0)

I am pretty sure that the problem is in as_mapper().

Below is a reprex with representative toy data. The tibble describes some episodes from the Three Stooges, the day the episode ran, and the character who was the protagonist for the episode.

Thanks!

Upvotes: 1

Views: 372

Answers (3)

akrun
akrun

Reputation: 887108

It is a warning because the list elements are not atomic, i.e. it is a list of tibble which can be identified, if we pull the column

df %>%
  nest(dups = episode)  %>% 
  pull(dups)
#<list_of<tbl_df<episode:integer>>[15]>
#[[1]]
# A tibble: 2 x 1
#  episode
#    <int>
#1       1
#2      16

#[[2]]
# A tibble: 2 x 1
#  episode
3    <int>
#1       2
#2      17
# ...

So, it is a list of tibble. either we can extract the column with pull

or we can flatten it and apply the function

library(purrr)
df %>%
   nest(dups = episode) %>%
   mutate(dup_num = map_dbl(dups, nrow), 
         dup_rows = map(dups, ~ flatten_int(.x) %>% 
                                     chr_dups))

NOTE: It is not clear why the function 'chr_dups' is applied on the 'episode' column which is numeric. The transformations are also not making sense


If we just need to paste the elements of 'episode' grouped by the other columns, a base R single line approach is

aggregate(episode~ day + name, df, toString)
#   day  name episode
#1  Fri Curly   3, 18
#2  Mon Curly  13, 28
#3  Wed Curly   8, 23
#4  Fri extra  15, 30
#5  Mon extra  10, 25
#6  Wed extra   5, 20
#7  Fri Larry  12, 27
#8  Mon Larry   7, 22
#9  Wed Larry   2, 17
#10 Fri   Moe   6, 21
#11 Mon   Moe   1, 16
#12 Wed   Moe  11, 26
#13 Fri Shemp   9, 24
#14 Mon Shemp   4, 19
#15 Wed Shemp  14, 29

Upvotes: 3

Calum You
Calum You

Reputation: 15072

I think the source of the warning has already been addressed. I'll add that you can do this without mapping, using just vectorised functions.

library(tidyverse)

df <- tibble(episode = 1:30,
             day = rep(c("Mon", "Wed", "Fri"), 10),
             name = rep(c(
               "Moe", "Larry", "Curly", "Shemp", "extra"
             ), 6))

df %>%
  group_by(day, name) %>%
  summarise(
    dup_num = n(),
    dup_rows = str_c(episode, collapse = ", ")
  )
#> # A tibble: 15 x 4
#> # Groups:   day [3]
#>    day   name  dup_num dup_rows
#>    <chr> <chr>   <int> <chr>   
#>  1 Fri   Curly       2 3, 18   
#>  2 Fri   extra       2 15, 30  
#>  3 Fri   Larry       2 12, 27  
#>  4 Fri   Moe         2 6, 21   
#>  5 Fri   Shemp       2 9, 24   
#>  6 Mon   Curly       2 13, 28  
#>  7 Mon   extra       2 10, 25  
#>  8 Mon   Larry       2 7, 22   
#>  9 Mon   Moe         2 1, 16   
#> 10 Mon   Shemp       2 4, 19   
#> 11 Wed   Curly       2 8, 23   
#> 12 Wed   extra       2 5, 20   
#> 13 Wed   Larry       2 2, 17   
#> 14 Wed   Moe         2 11, 26  
#> 15 Wed   Shemp       2 14, 29

Created on 2019-09-19 by the reprex package (v0.3.0)

Upvotes: 2

slava-kohut
slava-kohut

Reputation: 4233

Just adding to other posters. You don't have to use purrr to achieve what you want. Base R will do.

df <- df %>%
  nest(episode, .key = "dups") %>%
  mutate(dup_num = sapply(dups, nrow),
         dup_rows = sapply(dups, function(x) paste0(x$episode, collapse = ",")))

Upvotes: 1

Related Questions