Adam_G
Adam_G

Reputation: 7879

Convert list to string with conditions

I have a dataframe that looks like:

x <- tibble(
  experiment_id = rep(c('1a','1b'),each=5),
  keystroke = rep(c('a','SHIFT','b','SPACE','e'),2)
)

I know I can concatenate a list into a string using str_c or str_flatten and only keep certain values like below:

> y <- c('b','a','SPACE','d')
> y[y %in% letters]
[1] "b" "a" "d"

But when I try the same thing in a grouped pipe:

x_out <- x %>%
  group_by(experiment_id) %>%
  mutate(
    grp = cumsum(lag(keystroke=='SPACE',default=0))) %>% 
    group_by(grp, .add=TRUE) %>%
      mutate(within_keystrokes = list(keystroke),
             within_word = within_keystrokes[within_keystrokes %in% letters]
             ) %>% 
  ungroup()

I get the error:

Error: Problem with `mutate()` input `within_word`.
x Input `within_word` can't be recycled to size 2.
ℹ Input `within_word` is `within_keystrokes[within_keystrokes %in% letters]`.
ℹ Input `within_word` must be size 2 or 1, not 0.
ℹ The error occurred in group 1: experiment_id = "1a", grp = 0.

I read this answer and tried using ifelse but still ran into errors.

Any insight into what I'm doing wrong?

EDIT: EXPECTED OUTPUT Sorry for not including this. I would expect the final df to look like:

    x <- tibble(
      experiment_id = rep(c('1a','1b'),each=5),
      keystroke = rep(c('a','SHIFT','b','SPACE','e'),2),
      within_keystrokes = list(list('a','SHIFT','b','SPACE'), 
                          list('a','SHIFT','b','SPACE'), 
                          list('a','SHIFT','b','SPACE'), 
                          list('a','SHIFT','b','SPACE'),
                          'e',
                          list('a','SHIFT','b','SPACE'), 
                          list('a','SHIFT','b','SPACE'), 
                          list('a','SHIFT','b','SPACE'), 
                          list('a','SHIFT','b','SPACE'),
                          'e'),
      within_word = rep(list('ab','ab','ab','ab','e'),2)
)

Upvotes: 0

Views: 58

Answers (1)

Martin Gal
Martin Gal

Reputation: 16988

You almost solved your issue. You could use

library(dplyr)
library(stringr)

x %>%
  group_by(experiment_id, grp = cumsum(lag(keystroke == "SPACE", default = 0))) %>% 
  mutate(
    within_keystrokes = list(keystroke),
    within_word = list(str_c(keystroke[keystroke %in% letters], collapse = ""))
    )

to get

# A tibble: 10 x 4
   experiment_id keystroke within_keystrokes within_word
   <chr>         <chr>     <list>            <list>     
 1 1a            a         <list [4]>        <chr [1]>  
 2 1a            SHIFT     <list [4]>        <chr [1]>  
 3 1a            b         <list [4]>        <chr [1]>  
 4 1a            SPACE     <list [4]>        <chr [1]>  
 5 1a            e         <chr [1]>         <chr [1]>  
 6 1b            a         <list [4]>        <chr [1]>  
 7 1b            SHIFT     <list [4]>        <chr [1]>  
 8 1b            b         <list [4]>        <chr [1]>  
 9 1b            SPACE     <list [4]>        <chr [1]>  
10 1b            e         <chr [1]>         <chr [1]> 

If you don't want within_word to be a list, just remove the list() function.

Upvotes: 1

Related Questions