Prasanna S
Prasanna S

Reputation: 363

R dplyr gather wide to long multiple columns multiple values

I have the following wide-form data:

identity = c("Race1", "Race2", "Race3")
total_2017 = c(300,325,350)
total_2018 = c(200,225,250)
total_2019 = c(100,150,200)
pct_2017 = total_2017/sum(total_2017[1],total_2018[1],total_2019[1])
pct_2018 = total_2018/sum(total_2017[2],total_2018[2],total_2019[2])
pct_2019 = total_2019/sum(total_2017[3],total_2018[3],total_2019[3])
df.wide <- cbind.data.frame(identity, total_2017, total_2018, total_2019, pct_2017, pct_2018, pct_2019)

The wide data looks like this:

     identity total_2017 total_2018 total_2019  pct_2017  pct_2018 pct_2019
1    Race1        300        200        100 0.5000000 0.2857143   0.1250
2    Race2        325        225        150 0.5416667 0.3214286   0.1875
3    Race3        350        250        200 0.5833333 0.3571429   0.2500

The 3rd, 4th and 5th columns are the totals of "identity" for years 2017 to 2019, and the last three columns are the respective shares. I want to convert it into long format such that the totals are gathered into a column Enrollment and the percentages are gathered into a column Percent. I try the following code:

    library(dplyr)
    library(magrittr)
    library(tidyr)

df.long <- df.wide %>% 
  gather(key = "Total", value = "Enrollment", starts_with("total_")) %>%
  gather(key = "Share", value = "Percent", starts_with("pct_"))

Here are the first 10 rows of the long form data.

    head(df.long, 10)
   identity      Total Enrollment    Share   Percent
1     Race1 total_2017        300 pct_2017 0.5000000
2     Race2 total_2017        325 pct_2017 0.5416667
3     Race3 total_2017        350 pct_2017 0.5833333
4     Race1 total_2018        200 pct_2017 0.5000000
5     Race2 total_2018        225 pct_2017 0.5416667
6     Race3 total_2018        250 pct_2017 0.5833333
7     Race1 total_2019        100 pct_2017 0.5000000
8     Race2 total_2019        150 pct_2017 0.5416667
9     Race3 total_2019        200 pct_2017 0.5833333
10    Race1 total_2017        300 pct_2018 0.2857143

As can be seen, the Enrollment and Percent are ordered differently. How to have the same order in the columns?

Upvotes: 1

Views: 1248

Answers (1)

akrun
akrun

Reputation: 887851

This can be done with pivot_longer which can reshape multiple sets of columns

library(dplyr)
library(tidyr)
df.wide %>% 
   pivot_longer(cols = -identity, names_to = c('.value', 'year'), 
         names_sep="_") %>%
   arrange(year)
# A tibble: 9 x 4
#  identity year  total   pct
#  <chr>    <chr> <dbl> <dbl>
#1 Race1    2017    300 0.5  
#2 Race2    2017    325 0.542
#3 Race3    2017    350 0.583
#4 Race1    2018    200 0.286
#5 Race2    2018    225 0.321
#6 Race3    2018    250 0.357
#7 Race1    2019    100 0.125
#8 Race2    2019    150 0.188
#9 Race3    2019    200 0.25 

Upvotes: 3

Related Questions