Reputation: 607
I have a data frame called all.data
that looks like this:
A tibble: 10 x 5
survey_id eb.estimate.2021 se.rescaled.2021 eb.estimate.2022 se.rescaled.2022
<dbl> <dbl> <dbl> <dbl> <dbl>
1 606 4.93 1.11 5.70 1.23
2 610 7.27 0.696 7.35 0.579
3 571 2.37 0.636 0.345 0.998
4 590 6.00 0.659 6.36 0.791
5 616 2.79 0.825 5.08 1.17
6 617 3.11 0.633 1.68 0.779
7 596 5.97 0.857 5.36 0.994
8 581 5.26 0.975 4.79 0.735
9 577 7.11 0.903 5.52 0.999
10 598 5.10 0.650 4.71 0.776
I want to pivot_longer so it looks like this:
A tibble: 10 x 4
survey_id year se eb.est
<dbl> <chr> <dbl> <dbl>
1 606 2021 1.11 4.93
2 606 2022 1.23 5.70
3 610 2021 0.696 7.27
4 610 2022 0.579 7.35
5 571 2021 0.636 2.37
6 571 2022 0.998 0.345
7 590 2021 0.659 6.00
8 590 2022 0.791 6.36
9 616 2021 0.825 2.79
10 616 2022 1.17 5.08
I was able to do it by doing pivot_longer
twice: once for the eb.estimate
s and once for the se.rescaled
s and then merging them by year
and survey_id
. One of these pivot_longer
s is like this:
foo1 <- all.data %>%
pivot_longer(cols=c(se.rescaled.2021, se.rescaled.2022), names_to="year",
names_pattern="(?:se\\.rescaled\\.)(\\d{4})",
values_to="se") %>%
mutate(se=as.numeric(se))
This method offends my aesthetic and efficiency senses. There should be a way to do it in one pivot_longer
but I'm not having success. I tried this:
all.data.long2 <- all.data %>%
pivot_longer(cols=c("eb.estimate.2021", "eb.estimate.2022", "se.rescaled.2021", "se.rescaled.2022"),
names_to= ".value",
names_pattern="((eb\\.estimate|se\\.rescaled))(?:\\.)(\\d{4})")
But this gives me an error message:
Error:
regex should define 1 groups; 3 found.
(Actually, I don't understand why it found 3 groups; there are only 2 capturing groups in that regex.)
What I thought would be logical is to be able to select the matched segments with an index as we do with a function like str_match()
, sort of like this:
names_to=names_pattern[3]
values_to=names_pattern[1]
Is there some way to do this? Thanks in advance
Upvotes: 1
Views: 143
Reputation: 388982
You may use -
tidyr::pivot_longer(all.data, cols = -survey_id,
names_to = c('.value', 'year'),
names_pattern = '(eb.est|se).*\\.(\\d+)')
# survey_id year eb.est se
# <int> <chr> <dbl> <dbl>
# 1 606 2021 4.93 1.11
# 2 606 2022 5.7 1.23
# 3 610 2021 7.27 0.696
# 4 610 2022 7.35 0.579
# 5 571 2021 2.37 0.636
# 6 571 2022 0.345 0.998
# 7 590 2021 6 0.659
# 8 590 2022 6.36 0.791
# 9 616 2021 2.79 0.825
#10 616 2022 5.08 1.17
#11 617 2021 3.11 0.633
#12 617 2022 1.68 0.779
#13 596 2021 5.97 0.857
#14 596 2022 5.36 0.994
#15 581 2021 5.26 0.975
#16 581 2022 4.79 0.735
#17 577 2021 7.11 0.903
#18 577 2022 5.52 0.999
#19 598 2021 5.1 0.65
#20 598 2022 4.71 0.776
.value
corresponds to the pattern (eb.est|se)
and the number (\\d+
) is saved in a new column called year
.
data
all.data <- structure(list(survey_id = c(606L, 610L, 571L, 590L, 616L, 617L,
596L, 581L, 577L, 598L), eb.estimate.2021 = c(4.93, 7.27, 2.37,
6, 2.79, 3.11, 5.97, 5.26, 7.11, 5.1), se.rescaled.2021 = c(1.11,
0.696, 0.636, 0.659, 0.825, 0.633, 0.857, 0.975, 0.903, 0.65),
eb.estimate.2022 = c(5.7, 7.35, 0.345, 6.36, 5.08, 1.68,
5.36, 4.79, 5.52, 4.71), se.rescaled.2022 = c(1.23, 0.579,
0.998, 0.791, 1.17, 0.779, 0.994, 0.735, 0.999, 0.776)),
class = "data.frame", row.names = c(NA, -10L))
Upvotes: 2