Paryl
Paryl

Reputation: 221

R dplyr - Rearrange columns by pattern of names

I've got some long format data that 1) needs to be reshaped to wide and then 2) needs the columns resorted according pattern of their names. The example data is below:

#Orignial data
set.seed(100)
long_df <- tibble(id = rep(1:5, each = 3),
                  group = rep(c('g1','g2','g3'), times = 5),
                  mean = runif(15, min = 1, max = 10),
                  sd = runif(15, min = .025, max = 1))
long_df

# A tibble: 15 x 4
      id group  mean    sd
   <int> <chr> <dbl> <dbl>
 1     1 g1     3.77 0.677
 2     1 g2     3.32 0.224
 3     1 g3     5.97 0.374
 4     2 g1     1.51 0.375
 5     2 g2     5.22 0.698
 6     2 g3     5.35 0.547
 7     3 g1     8.31 0.718
 8     3 g2     4.33 0.550
 9     3 g3     5.92 0.755
10     4 g1     2.53 0.435
11     4 g2     6.62 0.192
12     4 g3     8.94 0.776
13     5 g1     3.52 0.885
14     5 g2     4.59 0.560
15     5 g3     7.86 0.296

#Reshaped to wide
wide_df <- long_df %>% 
  pivot_wider(id_cols = id, names_from = 'group', values_from = c('mean','sd'))
wide_df

# A tibble: 5 x 7
     id mean_g1 mean_g2 mean_g3 sd_g1 sd_g2 sd_g3
  <int>   <dbl>   <dbl>   <dbl> <dbl> <dbl> <dbl>
1     1    3.77    3.32    5.97 0.677 0.224 0.374
2     2    1.51    5.22    5.35 0.375 0.698 0.547
3     3    8.31    4.33    5.92 0.718 0.550 0.755
4     4    2.53    6.62    8.94 0.435 0.192 0.776
5     5    3.52    4.59    7.86 0.885 0.560 0.296

#Wide with proper column order
final_df <- wide_df %>% 
  select(id, mean_g1, sd_g1, mean_g2, sd_g2, mean_g3, sd_g3)
final_df

# A tibble: 5 x 7
     id mean_g1 sd_g1 mean_g2 sd_g2 mean_g3 sd_g3
  <int>   <dbl> <dbl>   <dbl> <dbl>   <dbl> <dbl>
1     1    3.77 0.677    3.32 0.224    5.97 0.374
2     2    1.51 0.375    5.22 0.698    5.35 0.547
3     3    8.31 0.718    4.33 0.550    5.92 0.755
4     4    2.53 0.435    6.62 0.192    8.94 0.776
5     5    3.52 0.885    4.59 0.560    7.86 0.296

Does someone know a way to either a) pivot the long data with the output being in the proper order or b) use a regex or sort variable to reorder the columns into the desired order? A tidyverse style solution would be preferred, but other approaches are welcome too.

Upvotes: 6

Views: 1857

Answers (1)

eipi10
eipi10

Reputation: 93871

You can run ends_with successively on a character vector of the numeric suffixes:

long_df %>% 
  pivot_wider(names_from=group, values_from=c(mean, sd)) %>% 
  select(id, ends_with(as.character(1:3)))
     id mean_g1 sd_g1 mean_g2 sd_g2 mean_g3 sd_g3
  <int>   <dbl> <dbl>   <dbl> <dbl>   <dbl> <dbl>
1     1    3.77 0.677    3.32 0.224    5.97 0.374
2     2    1.51 0.375    5.22 0.698    5.35 0.547
3     3    8.31 0.718    4.33 0.550    5.92 0.755
4     4    2.53 0.435    6.62 0.192    8.94 0.776
5     5    3.52 0.885    4.59 0.560    7.86 0.296

This might need the development version of dplyr to work (devtools::install_github("tidyverse/dplyr")). I'm running the development version and haven't checked this code with the current CRAN release.

You can also pivot_longer to the "longest" version of the data frame and then pivot_wider to get the desired column order without resorting:

long_df %>% 
  pivot_longer(cols=c(mean, sd)) %>% 
  pivot_wider(names_from=c(name, group), values_from=value)
     id mean_g1 sd_g1 mean_g2 sd_g2 mean_g3 sd_g3
  <int>   <dbl> <dbl>   <dbl> <dbl>   <dbl> <dbl>
1     1    3.77 0.677    3.32 0.224    5.97 0.374
2     2    1.51 0.375    5.22 0.698    5.35 0.547
3     3    8.31 0.718    4.33 0.550    5.92 0.755
4     4    2.53 0.435    6.62 0.192    8.94 0.776
5     5    3.52 0.885    4.59 0.560    7.86 0.296

Upvotes: 4

Related Questions