Reputation: 117
I have a column I'd like to separate:
df <- tibble(
variable = c("var_a_min", "var_ab_max", "var_abc_mean", "var_abcd_sd"),
value = c(1,2,3,4)
)
The data look like this:
# A tibble: 4 x 2
variable value
<chr> <dbl>
1 var_a_min 1
2 var_ab_max 2
3 var_abc_mean 3
4 var_abcd_sd 4
I'd like to separate the variable
column, such that what's after the last underscore becomes the second column.
df %>% separate(variable, c("variable", "metric"), sep = [after last _])
I tried out some regex, but couldn't figure it out. The data should look like this:
# A tibble: 4 x 3
variable metric value
<chr> <chr> <dbl>
1 var_a min 1
2 var_ab max 2
3 var_abc mean 3
4 var_abcd sd 4
Upvotes: 1
Views: 1531
Reputation: 887128
An option is extract
to capture the characters as a group. In the firsst capture group, it is a greedy match ((.*)
- zero or more characters), followed by a _
and in the second group (([^_]+)$
), match characters that are not a _
until the end of the string ($
). In this way, it make sure the first greedy match backtracks
library(tidyverse)
df %>%
extract(variable, into = c("variable", "metric"), "(.*)_([^_]+$)")
separate
can take regex lookarounds as well, so if the prefix substring is 'var', then can make a lookaround with
df %>%
separate(variable, into = c("variable", "metric"), "(?<!var)_")
# A tibble: 4 x 3
# variable metric value
# <chr> <chr> <dbl>
#1 var_a min 1
#2 var_ab max 2
#3 var_abc mean 3
#4 var_abcd sd 4
Upvotes: 5