Reputation: 2877

Reordering columns using common names - dplyr

My data comes from a database which, depending on when I run my SQL query could contain different values for POS from one week to the other.

Not knowing which values will be in a variable makes it very hard to automate the creation of a report.

My data looks as follows:

sample <- data.frame(DRUG = c("A","A","B"),POS = c("Hospital","Physician","Home"),GROSS_COST = c(50,100,60), NET_COST = c(45,80,40))

I need to pivot this data frame wider so that there's a column for each pos by cost (gross & net).

This can be easily achieve using pivot_wider:

x <- sample %>% pivot_wider(names_from = POS, values_from = c(GROSS_COST,NET_COST))

Objective I would like to be able to keep the columns for each POS together i.e. the GROSS_COST_Hospital and NET_COST_Hospital would be side by side, similar for all other POS columns.

Is there an elegant way to group columns using string matching?

Upvotes: 2

Answers (4)

AnilGoyal

Reputation: 26238

With the advent of tidyr 1.2.0, the issue is finally resolved, you may do this directly using names_vary argument

library(tidyr)
sample <- data.frame(DRUG = c("A","A","B"),POS = c("Hospital","Physician","Home"),GROSS_COST = c(50,100,60), NET_COST = c(45,80,40))

sample %>% 
  pivot_wider(names_from = POS, values_from = c(GROSS_COST,NET_COST), names_vary = 'slowest')
#> # A tibble: 2 x 7
#>   DRUG  GROSS_COST_Hospital NET_COST_Hospital GROSS_COST_Physi~ NET_COST_Physic~
#>   <chr>               <dbl>             <dbl>             <dbl>            <dbl>
#> 1 A                      50                45               100               80
#> 2 B                      NA                NA                NA               NA
#> # ... with 2 more variables: GROSS_COST_Home <dbl>, NET_COST_Home <dbl>

^{Created on 2022-02-18 by the reprex package (v2.0.1)}

Upvotes: 0

ThomasIsCoding

Reputation: 102710

A data.table option using dcast + melt

> dcast(melt(setDT(sample), id.vars = c("DRUG", "POS")), DRUG ~ variable + POS)
   DRUG GROSS_COST_Home GROSS_COST_Hospital GROSS_COST_Physician NET_COST_Home
1:    A              NA                  50                  100            NA
2:    B              60                  NA                   NA            40
   NET_COST_Hospital NET_COST_Physician
1:                45                 80
2:                NA                 NA

Upvotes: 1

akrun

Reputation: 887891

We can do an ordering on the select step

library(dplyr)
library(tidyr)
library(stringr)
sample %>% 
  pivot_wider(names_from = POS, values_from = c(GROSS_COST,NET_COST)) %>% 
  select(DRUG, names(.)[-1][order(str_extract(names(.)[-1], '[^_]+$'))])
# A tibble: 2 x 7
#  DRUG  GROSS_COST_Home NET_COST_Home GROSS_COST_Hospital NET_COST_Hospital GROSS_COST_Physician NET_COST_Physician
#  <chr>           <dbl>         <dbl>               <dbl>             <dbl>                <dbl>              <dbl>
#1 A                  NA            NA                  50                45                  100                 80
#2 B                  60            40                  NA                NA                   NA                 NA

Upvotes: 2

Ronak Shah

Reputation: 389275

Unfortunately, I don't think there is a direct solution to this (yet!). See https://github.com/tidyverse/tidyr/issues/839 .

For now you can get the data in long format so you can control their ordering the way you want.

library(tidyr)

sample %>%
  pivot_longer(cols = c(GROSS_COST, NET_COST)) %>%
  pivot_wider(names_from = c(name, POS), values_from = value)

#   DRUG  GROSS_COST_Hosp… NET_COST_Hospit… GROSS_COST_Phys… NET_COST_Physic…
#  <chr>            <dbl>            <dbl>            <dbl>            <dbl>
#1 A                   50               45              100               80
#2 B                   NA               NA               NA               NA
# … with 2 more variables: GROSS_COST_Home <dbl>, NET_COST_Home <dbl>

Upvotes: 3

Reordering columns using common names - dplyr

Answers (4)

Related Questions