Erdne Htábrob
Erdne Htábrob

Reputation: 879

pivot_longer with column pairs

I am again struggling with transforming a wide df into a long one using pivot_longer The data frame is a result of power analysis for different effect sizes and sample sizes, this is how the original df looks like:

  es_issue_owner es_independence es_party pwr_issue_owner_1200 pwr_independence_1200 pwr_party_1200 pwr_issue_owner_2400 pwr_independence_2400 pwr_party_2400
1            0.1             0.1      0.1                0.087                 0.080          0.081                0.130                 0.163          0.102
2            0.2             0.2      0.2                0.235                 0.273          0.157                0.406                 0.513          0.267

Or with dput:

example <- structure(list(es_issue_owner = c(0.1, 0.2), es_independence = c(0.1, 
0.2), es_party = c(0.1, 0.2), pwr_issue_owner_1200 = c(0.087, 
0.235), pwr_independence_1200 = c(0.08, 0.273), pwr_party_1200 = c(0.081, 
0.157), pwr_issue_owner_2400 = c(0.13, 0.406), pwr_independence_2400 = c(0.163, 
0.513), pwr_party_2400 = c(0.102, 0.267)), row.names = 1:2, class = "data.frame")

Each effect size (es) for three meassures ("independence", "issueowner", "party") is paired with a power calculation on a 1200 and on a 2400 sample size. This is how the output I want to get would look like based on the example above:

           type  es  pwr value
1  independence 0.1 1200 0.080
2   issue_owner 0.1 1200 0.087
3         party 0.1 1200 0.081
4  independence 0.2 1200 0.273
5   issue_owner 0.2 1200 0.235
6         party 0.2 1200 0.157
7  independence 0.1 2400 0.163
8   issue_owner 0.1 2400 0.130
9         party 0.1 2400 0.102
10 independence 0.2 2400 0.513
11  issue_owner 0.2 2400 0.406
12        party 0.2 2400 0.267

or, with dput:

output <- structure(list(type = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L), .Label = c("independence", "issueowner", 
"party"), class = "factor"), es = c(0.1, 0.1, 0.1, 0.2, 0.2, 
0.2, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2), pwr = c(1200, 1200, 1200, 
1200, 1200, 1200, 2400, 2400, 2400, 2400, 2400, 2400), value = c("0.080", 
"0.087", "0.081", "0.273", "0.235", "0.157", "0.163", "0.130", 
"0.102", "0.513", "0.406", "0.267")), out.attrs = list(dim = c(type = 3L, 
es = 2L, pwr = 2L, value = 1L), dimnames = list(type = c("type=independence", 
"type=issueowner", "type=party"), es = c("es=0.1", "es=0.2"), 
    pwr = c("pwr=1200", "pwr=2400"), value = "value=NA")), class = "data.frame", row.names = c(NA, 
-12L))

As a start I tried experimenting with this:

example %>% 
  pivot_longer(cols = everything(),
               names_pattern = "(es_[A-Za-z]+)(pwr_[A-Za-z]+_1200)(pwr_[A-Za-z]+_2400)",
               # names_sep = "(?=\\d)_(?=\\d)",
               names_to = c("es", "pwr_1200", "pwr_2400"),
               values_to = "value")

But it did not work, so I tried from two steps, which sort of works, but the "pairing" gets messed up:

  example %>% 
  # pivot_longer(cols = everything(),
  #              names_pattern = "(es_[A-Za-z]+)(pwr_[A-Za-z]+_1200)(pwr_[A-Za-z]+_2400)",
  #              # names_sep = "(?=\\d)_(?=\\d)",
  #              names_to = c("es", "pwr_1200", "pwr_2400"),
  #              values_to = "value")
  pivot_longer(cols = contains("pwr_"),
               # names_pattern = "es_pwr(.*)1200_pwr(.*)2400",
               names_sep = "_(?=\\d)",
               names_to = c("pwr_type", "pwr_sample"), values_to = "value") %>%
  pivot_longer(cols = contains("es_"),
               # names_pattern = "es_pwr(.*)1200_pwr(.*)2400",
               # names_sep = "_(?=\\d)",
               names_to = "es_type", values_to = "es")

I would appreciate any help!

Upvotes: 3

Views: 244

Answers (1)

Ma&#235;l
Ma&#235;l

Reputation: 52409

library(tidyverse)

example %>% 
  pivot_longer(cols = starts_with("es"), names_to = "type", names_prefix = "es_", values_to = "es") %>%
  pivot_longer(cols = starts_with("pwr"), names_to = "pwr", names_prefix = "pwr_") %>% 
  filter(substr(type, 1, 3) == substr(pwr, 1, 3)) %>% 
  mutate(pwr = parse_number(pwr)) %>% 
  arrange(pwr, es, type)

output

   type            es   pwr value
 1 independence   0.1  1200 0.08 
 2 issue_owner    0.1  1200 0.087
 3 party          0.1  1200 0.081
 4 independence   0.2  1200 0.273
 5 issue_owner    0.2  1200 0.235
 6 party          0.2  1200 0.157
 7 independence   0.1  2400 0.163
 8 issue_owner    0.1  2400 0.13 
 9 party          0.1  2400 0.102
10 independence   0.2  2400 0.513
11 issue_owner    0.2  2400 0.406
12 party          0.2  2400 0.267

Upvotes: 4

Related Questions