Reputation: 21387
Suppose I have this code trying to extract x and y from df:
df <- data.frame(
num = c(1,2),
x_cap = c(4,5),
x_cap_rolling = c(4.4,5.5),
y_cap = c(7,8),
y_cap_rolling = c(7.7,8.8)
)
df_long <- df %>% pivot_longer(cols=!num,
names_pattern = "(.+)_cap",
names_to = "type", values_to="cap")
It gets:
> df_long
# A tibble: 8 x 3
num type cap
<dbl> <chr> <dbl>
1 1 x 4
2 1 x 4.4
3 1 y 7
4 1 y 7.7
5 2 x 5
6 2 x 5.5
7 2 y 8
8 2 y 8.8
Hmm, close, but I want the type of cap 4 to be "x" and the type of cap 4.4 to be "x_rolling", and so on.
Here are some failed attempts at that:
df_long <- df %>% pivot_longer(cols=!num,
names_pattern = "(.+)_cap(_rolling)*",
names_to = "type", values_to="cap")
df_long <- df %>% pivot_longer(cols=!num,
names_pattern = "(.+)_cap(.*)",
names_to = "type", values_to="cap")
Each returns the error:
Error: `regex` should define 1 groups; found.
I do not understand this error.
How do I extract a result where the type is x, x_rolling, y, y_rolling at the appropriate rows?
Upvotes: 1
Views: 2942
Reputation: 887158
We could mutate
after doing the pivoting
library(dplyr)
library(tidyr)
library(stringr)
df %>%
pivot_longer(cols = -num, names_to = "type", values_to = "cap") %>%
mutate(type = str_remove(type, "_cap"))
-output
# A tibble: 8 x 3
# num type cap
# <dbl> <chr> <dbl>
#1 1 x 4
#2 1 x_rolling 4.4
#3 1 y 7
#4 1 y_rolling 7.7
#5 2 x 5
#6 2 x_rolling 5.5
#7 2 y 8
#8 2 y_rolling 8.8
Upvotes: 1
Reputation: 388982
Since the values that we want to capture is not aligned sequentially you could either rename the columns first based on the data we want to extract before getting it into long format or get the data in long format and then extract the relevant text.
library(dplyr)
library(tidyr)
df %>%
rename_with(~trimws(sub('cap_?', '', .), whitespace = '_')) %>%
pivot_longer(cols = -num,
names_to = "type", values_to="cap")
# num type cap
# <dbl> <chr> <dbl>
#1 1 x 4
#2 1 x_rolling 4.4
#3 1 y 7
#4 1 y_rolling 7.7
#5 2 x 5
#6 2 x_rolling 5.5
#7 2 y 8
#8 2 y_rolling 8.8
Upvotes: 2