Reputation: 115
separate_rows separate based on column values into multiple rows, repeating value of other columns.
> t <- tibble(x = c("a,b", "c,d"), v = c(1,2))
> t %>% separate_rows(x, sep = ",")
# A tibble: 4 × 2
x v
<chr> <dbl>
1 a 1
2 b 1
3 c 2
4 d 2
However, what if I want to apply a function over it? after the separate for example change the value of x to true if in ("a", "b") and false otherwise.
I understand all I need to do is a mutate follow separate_rows. My question is if there is already a function that does separate and process a comma delimited value. How do I use the function in a similar way as separate_rows? (the reason is I want to separate complex split logic into a function rather than in mutate)
For example below does the logic above and return a vector of values. Is it possible perform similar operation as separate rows? (ie. split on the column and repeating row values)
proc <- function(text){
text %>%
str_split(pattern = ",") %>%
unlist() %>%
sapply(function(x){
if(x %in% c("a", "b"))
return(T)
else
return(F)
})
}
Upvotes: 0
Views: 216
Reputation: 28675
Kind of
If you keep the output of your function (here proc
) in list form instead of unlist
ing, you can apply that function to x
with mutate
and then unnest
x
. Keeping it in list form preserves the info about which element of proc(t$x)
corresponds to which row of t
, and that info is lost when you unlist
.
library(tidyr)
library(stringr)
library(dplyr, warn.conflicts = FALSE)
proc <- function(text) {
text %>%
str_split(pattern = ",") %>%
lapply(function(x) {
x %in% c("a", "b")
})
}
t <- tibble(x = c("a,b", "c,d"), v = c(1,2))
t %>%
mutate(x = proc(x)) %>%
unnest(x)
#> # A tibble: 4 × 2
#> x v
#> <lgl> <dbl>
#> 1 TRUE 1
#> 2 TRUE 1
#> 3 FALSE 2
#> 4 FALSE 2
Created on 2022-02-20 by the reprex package (v2.0.1)
But, if you're going to use two functions anyway (mutate
and unnest
), you may as well just use separate_rows
and then mutate
.
Or, you could pack everything into the proc
function.
library(tidyr)
library(stringr)
library(dplyr, warn.conflicts = FALSE)
proc <- function(df, col) {
fun <- function(text) {
text %>%
str_split(pattern = ",") %>%
lapply(function(x) {
x %in% c("a", "b")
})
}
df %>%
mutate(across({{ col }}, fun)) %>%
unnest({{ col }})
}
t <- tibble(x = c("a,b", "c,d"), v = c(1,2))
t %>%
proc(x)
#> # A tibble: 4 × 2
#> x v
#> <lgl> <dbl>
#> 1 TRUE 1
#> 2 TRUE 1
#> 3 FALSE 2
#> 4 FALSE 2
Created on 2022-02-20 by the reprex package (v2.0.1)
Upvotes: 1