Reputation: 33
Hi people of stackoverflow,
I have trouble formatting my data frame efficiently. My original frame looks like this:
region transportation_type X2020.01.13 X2020.01.14 X2020.01.15 X2020.01.16 X2020.01.17
1 Akron driving 100.0 103.06 107.50 106.14 123.62
2 Akron transit 100.0 106.69 103.75 100.22 89.04
3 Akron walking 100.0 97.23 79.05 74.77 89.55
4 Albany driving 100.0 102.35 107.35 105.54 128.97
5 Albany transit 100.0 100.14 105.95 107.76 101.39
6 Albany walking 100.0 108.36 113.36 107.52 129.43
To merge it with some other data, I want to convert the transportation_type
into columns (wide format) and the dates X2020.01.13-X2020.01.16
into one column (long format), like so:
region date driving transit walking
1 Akron X2020.01.13 100.0 100.0 100.0
2 Akron X2020.01.14 103.06 106.69 97.23
3 Akron X2020.01.15 107.50 103.75 79.05
4 Akron X2020.01.16 106.14 100.22 74.77
5 Akron X2020.01.17 123.62 89.04 89.55
6 Albany X2020.01.13 100.0 100.0 100.0
7 Albany X2020.01.14 103.06 106.69 97.23
8 Albany X2020.01.15 107.50 103.75 79.05
9 Albany X2020.01.16 106.14 100.22 74.77
10 Albany X2020.01.17 123.62 89.04 89.55
I can reformat using the in two steps, using e.g. the "melt"
command, by first converting the transportation_type
into wide format and then the dates into long.
Can I do it more efficiently in one step?
Thanks for your help!
Upvotes: 2
Views: 1964
Reputation: 101335
Here is a base R option using nested reshape
s
`row.names<-`(reshape(
reshape(
df,
direction = "long",
idvar = c("region", "transportation_type"),
varying = -(1:2),
times = names(df)[-c(1:2)],
v.names = "val"
),
direction = "wide",
idvar = c("time", "region"),
timevar = "transportation_type"
), NULL)
which gives
region time val.driving val.transit val.walking
1 Akron X2020.01.13 100.00 100.00 100.00
2 Albany X2020.01.13 100.00 100.00 100.00
3 Akron X2020.01.14 103.06 106.69 97.23
4 Albany X2020.01.14 102.35 100.14 108.36
5 Akron X2020.01.15 107.50 103.75 79.05
6 Albany X2020.01.15 107.35 105.95 113.36
7 Akron X2020.01.16 106.14 100.22 74.77
8 Albany X2020.01.16 105.54 107.76 107.52
9 Akron X2020.01.17 123.62 89.04 89.55
10 Albany X2020.01.17 128.97 101.39 129.43
Upvotes: 1
Reputation: 78927
Here is another approach with pivoting wide - long - wide:
library(dplyr)
library(tidyr)
df %>%
pivot_wider(
names_from = transportation_type,
values_from = 3:7
) %>%
pivot_longer(
cols = starts_with("X"),
names_to = "date"
) %>%
separate(date, c("date", "transportation"), sep="_") %>%
pivot_wider(
names_from = transportation
)
# A tibble: 10 x 5
region date driving transit walking
<chr> <chr> <dbl> <dbl> <dbl>
1 Akron X2020.01.13 100 100 100
2 Akron X2020.01.14 103. 107. 97.2
3 Akron X2020.01.15 108. 104. 79.0
4 Akron X2020.01.16 106. 100. 74.8
5 Akron X2020.01.17 124. 89.0 89.6
6 Albany X2020.01.13 100 100 100
7 Albany X2020.01.14 102. 100. 108.
8 Albany X2020.01.15 107. 106. 113.
9 Albany X2020.01.16 106. 108. 108.
10 Albany X2020.01.17 129. 101. 129.
Upvotes: 2
Reputation: 1306
There aren't any functions in base R or the major reshaping packages that can simultaneously pivot in both directions.
In general, I would recommend switching to using the tidyr::pivot_wider()
and tidyr::pivot_longer()
functions. They are still maintained (reshape and reshape2 no longer receive updates), and they are easier to work with.
dat <- tibble::tribble(
~region, ~transportation_type, ~X2020.01.13, ~X2020.01.14, ~X2020.01.15, ~X2020.01.16, ~X2020.01.17,
"Akron", "driving", 100.0, 103.06, 107.50, 106.14, 123.62,
"Akron", "transit", 100.0, 106.69, 103.75, 100.22, 89.04,
"Akron", "walking", 100.0, 97.23, 79.05, 74.77, 89.55,
"Albany", "driving", 100.0, 102.35, 107.35, 105.54, 128.97,
"Albany", "transit", 100.0, 100.14, 105.95, 107.76, 101.39,
"Albany", "walking", 100.0, 108.36, 113.36, 107.52, 129.43
)
dat |>
tidyr::pivot_longer(
cols = -c(region, transportation_type),
names_to = "date",
values_to = "values"
) |>
tidyr::pivot_wider(
names_from = transportation_type,
values_from = values
)
#> # A tibble: 10 x 5
#> region date driving transit walking
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 Akron X2020.01.13 100 100 100
#> 2 Akron X2020.01.14 103. 107. 97.2
#> 3 Akron X2020.01.15 108. 104. 79.0
#> 4 Akron X2020.01.16 106. 100. 74.8
#> 5 Akron X2020.01.17 124. 89.0 89.6
#> 6 Albany X2020.01.13 100 100 100
#> 7 Albany X2020.01.14 102. 100. 108.
#> 8 Albany X2020.01.15 107. 106. 113.
#> 9 Albany X2020.01.16 106. 108. 108.
#> 10 Albany X2020.01.17 129. 101. 129.
Created on 2021-08-22 by the reprex package (v2.0.0)
Upvotes: 2