dfrankow
dfrankow

Reputation: 21387

How to use names_pattern in pivot_longer?

Suppose I have this code trying to extract x and y from df:

df <- data.frame(
  num = c(1,2),
  x_cap = c(4,5),
  x_cap_rolling = c(4.4,5.5),
  y_cap = c(7,8),
  y_cap_rolling = c(7.7,8.8)
)

df_long <- df %>% pivot_longer(cols=!num,
               names_pattern = "(.+)_cap",
               names_to = "type", values_to="cap")

It gets:

> df_long
# A tibble: 8 x 3
    num type    cap
  <dbl> <chr> <dbl>
1     1 x       4  
2     1 x       4.4
3     1 y       7  
4     1 y       7.7
5     2 x       5  
6     2 x       5.5
7     2 y       8  
8     2 y       8.8

Hmm, close, but I want the type of cap 4 to be "x" and the type of cap 4.4 to be "x_rolling", and so on.

Here are some failed attempts at that:

df_long <- df %>% pivot_longer(cols=!num,
               names_pattern = "(.+)_cap(_rolling)*",
               names_to = "type", values_to="cap")

df_long <- df %>% pivot_longer(cols=!num,
               names_pattern = "(.+)_cap(.*)",
               names_to = "type", values_to="cap")

Each returns the error:

Error: `regex` should define 1 groups;  found.

I do not understand this error.

How do I extract a result where the type is x, x_rolling, y, y_rolling at the appropriate rows?

Upvotes: 1

Views: 2942

Answers (2)

akrun
akrun

Reputation: 887158

We could mutate after doing the pivoting

library(dplyr)
library(tidyr)
library(stringr)
df %>% 
   pivot_longer(cols = -num, names_to = "type", values_to = "cap") %>% 
   mutate(type = str_remove(type, "_cap"))

-output

# A tibble: 8 x 3
#    num type        cap
#  <dbl> <chr>     <dbl>
#1     1 x           4  
#2     1 x_rolling   4.4
#3     1 y           7  
#4     1 y_rolling   7.7
#5     2 x           5  
#6     2 x_rolling   5.5
#7     2 y           8  
#8     2 y_rolling   8.8

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

Since the values that we want to capture is not aligned sequentially you could either rename the columns first based on the data we want to extract before getting it into long format or get the data in long format and then extract the relevant text.

library(dplyr)
library(tidyr)

df %>%
  rename_with(~trimws(sub('cap_?', '', .), whitespace = '_')) %>%
  pivot_longer(cols = -num, 
               names_to = "type", values_to="cap")

#   num type        cap
#  <dbl> <chr>     <dbl>
#1     1 x           4  
#2     1 x_rolling   4.4
#3     1 y           7  
#4     1 y_rolling   7.7
#5     2 x           5  
#6     2 x_rolling   5.5
#7     2 y           8  
#8     2 y_rolling   8.8

Upvotes: 2

Related Questions