Reputation: 363
I have the following wide-form data:
identity = c("Race1", "Race2", "Race3")
total_2017 = c(300,325,350)
total_2018 = c(200,225,250)
total_2019 = c(100,150,200)
pct_2017 = total_2017/sum(total_2017[1],total_2018[1],total_2019[1])
pct_2018 = total_2018/sum(total_2017[2],total_2018[2],total_2019[2])
pct_2019 = total_2019/sum(total_2017[3],total_2018[3],total_2019[3])
df.wide <- cbind.data.frame(identity, total_2017, total_2018, total_2019, pct_2017, pct_2018, pct_2019)
The wide data looks like this:
identity total_2017 total_2018 total_2019 pct_2017 pct_2018 pct_2019
1 Race1 300 200 100 0.5000000 0.2857143 0.1250
2 Race2 325 225 150 0.5416667 0.3214286 0.1875
3 Race3 350 250 200 0.5833333 0.3571429 0.2500
The 3rd, 4th and 5th columns are the totals of "identity" for years 2017 to 2019, and the last three columns are the respective shares.
I want to convert it into long format such that the totals
are gathered into a column Enrollment
and the percentages are gathered into a column Percent
. I try the following code:
library(dplyr)
library(magrittr)
library(tidyr)
df.long <- df.wide %>%
gather(key = "Total", value = "Enrollment", starts_with("total_")) %>%
gather(key = "Share", value = "Percent", starts_with("pct_"))
Here are the first 10 rows of the long form data.
head(df.long, 10)
identity Total Enrollment Share Percent
1 Race1 total_2017 300 pct_2017 0.5000000
2 Race2 total_2017 325 pct_2017 0.5416667
3 Race3 total_2017 350 pct_2017 0.5833333
4 Race1 total_2018 200 pct_2017 0.5000000
5 Race2 total_2018 225 pct_2017 0.5416667
6 Race3 total_2018 250 pct_2017 0.5833333
7 Race1 total_2019 100 pct_2017 0.5000000
8 Race2 total_2019 150 pct_2017 0.5416667
9 Race3 total_2019 200 pct_2017 0.5833333
10 Race1 total_2017 300 pct_2018 0.2857143
As can be seen, the Enrollment and Percent are ordered differently. How to have the same order in the columns?
Upvotes: 1
Views: 1248
Reputation: 887851
This can be done with pivot_longer
which can reshape multiple sets of columns
library(dplyr)
library(tidyr)
df.wide %>%
pivot_longer(cols = -identity, names_to = c('.value', 'year'),
names_sep="_") %>%
arrange(year)
# A tibble: 9 x 4
# identity year total pct
# <chr> <chr> <dbl> <dbl>
#1 Race1 2017 300 0.5
#2 Race2 2017 325 0.542
#3 Race3 2017 350 0.583
#4 Race1 2018 200 0.286
#5 Race2 2018 225 0.321
#6 Race3 2018 250 0.357
#7 Race1 2019 100 0.125
#8 Race2 2019 150 0.188
#9 Race3 2019 200 0.25
Upvotes: 3