PraGalaxy
PraGalaxy

Reputation: 59

Why does mutate not act like I expect in this code?

I have a confusion regarding how the mutate in tidyverse/ dplyr works. I have included a reproducible example here. One uses mutate and one uses a loop. I would expect both to give the same result, but they do not. I have no idea why. Any help would be appreciated.

library(tidyverse)
d <- data.frame(x = c('a,a,b,b,b','a,a','a,b,b,b,c,c,c'))
# Approach 1 (mutate)
d %>% 
  mutate(y = paste(unique(str_split(x, ',')[[1]]), collapse = ','))
d
# Approach 2 (loop)
for (i in 1:nrow(d))
{
  d$y[i] <- paste(unique(str_split(d$x[i], ',')[[1]]), collapse = ',')
}
d

I expect output to be the same for both approaches, but they are not.

Upvotes: 1

Views: 54

Answers (1)

akrun
akrun

Reputation: 887048

Issue is that we are subsetting only first list element with [[1]] and then the unique is only on that element. Instead, we need to loop through the list (from str_split output)

library(tidyverse) 
d %>%
     mutate(y = str_split(x, ',') %>%  # output is a list
                   map_chr(~ unique(.x) %>% # loop with map, get the unique elements 
                    toString)) # paste the strings together
#             x       y
#1     a,a,b,b,b    a, b
#2           a,a       a
#3 a,b,b,b,c,c,c a, b, c

In the for loop, it was not the case because the splitting was done one element at a time str_split(d$x[i]


To understand better, the str_split (strsplit base R) is vectorized. They can take multiple strings and split into alistofvector`s equal to the length of the intial vector

str_split(d$x, ',') # list of length 3
#[[1]]
#[1] "a" "a" "b" "b" "b"

#[[2]]
#[1] "a" "a"

#[[3]]
#[1] "a" "b" "b" "b" "c" "c" "c"

Extracting the first [[1]]

str_split(d$x, ',')[[1]]
#[1] "a" "a" "b" "b" "b"

In the for loop, we are individually splitting the elements and extract the list (length 1) element

str_split(d$x[1], ',')[[1]]
#[1] "a" "a" "b" "b" "b"
str_split(d$x[2], ',')[[1]]
#[1] "a" "a"

That is the reason, we need to loop over the list and then get the unique from each of the elements

Upvotes: 1

Related Questions