Reputation: 2877
My data comes from a database which, depending on when I run my SQL query could contain different values for POS from one week to the other.
Not knowing which values will be in a variable makes it very hard to automate the creation of a report.
My data looks as follows:
sample <- data.frame(DRUG = c("A","A","B"),POS = c("Hospital","Physician","Home"),GROSS_COST = c(50,100,60), NET_COST = c(45,80,40))
I need to pivot this data frame wider so that there's a column for each pos by cost (gross & net).
This can be easily achieve using pivot_wider:
x <- sample %>% pivot_wider(names_from = POS, values_from = c(GROSS_COST,NET_COST))
Objective I would like to be able to keep the columns for each POS together i.e. the GROSS_COST_Hospital and NET_COST_Hospital would be side by side, similar for all other POS columns.
Is there an elegant way to group columns using string matching?
Upvotes: 2
Views: 90
Reputation: 26238
With the advent of tidyr 1.2.0, the issue is finally resolved, you may do this directly using names_vary
argument
library(tidyr)
sample <- data.frame(DRUG = c("A","A","B"),POS = c("Hospital","Physician","Home"),GROSS_COST = c(50,100,60), NET_COST = c(45,80,40))
sample %>%
pivot_wider(names_from = POS, values_from = c(GROSS_COST,NET_COST), names_vary = 'slowest')
#> # A tibble: 2 x 7
#> DRUG GROSS_COST_Hospital NET_COST_Hospital GROSS_COST_Physi~ NET_COST_Physic~
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 A 50 45 100 80
#> 2 B NA NA NA NA
#> # ... with 2 more variables: GROSS_COST_Home <dbl>, NET_COST_Home <dbl>
Created on 2022-02-18 by the reprex package (v2.0.1)
Upvotes: 0
Reputation: 102710
A data.table
option using dcast
+ melt
> dcast(melt(setDT(sample), id.vars = c("DRUG", "POS")), DRUG ~ variable + POS)
DRUG GROSS_COST_Home GROSS_COST_Hospital GROSS_COST_Physician NET_COST_Home
1: A NA 50 100 NA
2: B 60 NA NA 40
NET_COST_Hospital NET_COST_Physician
1: 45 80
2: NA NA
Upvotes: 1
Reputation: 887891
We can do an ordering on the select
step
library(dplyr)
library(tidyr)
library(stringr)
sample %>%
pivot_wider(names_from = POS, values_from = c(GROSS_COST,NET_COST)) %>%
select(DRUG, names(.)[-1][order(str_extract(names(.)[-1], '[^_]+$'))])
# A tibble: 2 x 7
# DRUG GROSS_COST_Home NET_COST_Home GROSS_COST_Hospital NET_COST_Hospital GROSS_COST_Physician NET_COST_Physician
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 A NA NA 50 45 100 80
#2 B 60 40 NA NA NA NA
Upvotes: 2
Reputation: 389275
Unfortunately, I don't think there is a direct solution to this (yet!). See https://github.com/tidyverse/tidyr/issues/839 .
For now you can get the data in long format so you can control their ordering the way you want.
library(tidyr)
sample %>%
pivot_longer(cols = c(GROSS_COST, NET_COST)) %>%
pivot_wider(names_from = c(name, POS), values_from = value)
# DRUG GROSS_COST_Hosp… NET_COST_Hospit… GROSS_COST_Phys… NET_COST_Physic…
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 A 50 45 100 80
#2 B NA NA NA NA
# … with 2 more variables: GROSS_COST_Home <dbl>, NET_COST_Home <dbl>
Upvotes: 3