Reputation: 27
I have a data frame structured like this:
+----------+------+--------+-------+
| Location | year | group1 | Value |
I would like to expand the data frame to include 3 rows for every location, year, and group1 combination and generate a group2 variable that identifies these new combinations (1-3). Ideally, the data frame will look like this:
+----------+------+--------+-------+
| a | 2020 | 1 | x |
| a | 2020 | 2 | y |
| a | 2020 | 3 | z |
| a | 2021 | 1 | x |
| a | 2021 | 2 | y |
| a | 2021 | 3 | z |
| b | 2020 | 1 | x |
| b | 2020 | 2 | y |
| b | 2020 | 3 | z |
+----------+------+--------+-------+
+----------+------+--------+-------+--------+
| Location | year | group1 | Value | group2 |
+----------+------+--------+-------+--------+
| a | 2020 | 1 | x | 1 |
| a | 2020 | 1 | x | 2 |
| a | 2020 | 1 | x | 3 |
| a | 2020 | 2 | y | 1 |
| a | 2020 | 2 | y | 2 |
| a | 2020 | 2 | y | 3 |
| ... | ... |... |... |... |
+----------+------+--------+-------+--------+
I was able to expand the dataframe to the correct number of total rows using the following code:
df[rep(seq_len(nrow(df)),3), 1:4]
But couldn't figure out how to add the group2 variable shown above.
Upvotes: 1
Views: 260
Reputation: 886938
We can use crossing
from tidyr
library(tidyr)
library(dplyr)
crossing(df1, group2 = 1:3)
# A tibble: 27 x 5
# Location year group1 Value group2
# <chr> <int> <int> <chr> <int>
# 1 a 2020 1 x 1
# 2 a 2020 1 x 2
# 3 a 2020 1 x 3
# 4 a 2020 2 y 1
# 5 a 2020 2 y 2
# 6 a 2020 2 y 3
# 7 a 2020 3 z 1
# 8 a 2020 3 z 2
# 9 a 2020 3 z 3
#10 a 2021 1 x 1
# … with 17 more rows
Or create a list
column and then unnest
df1 %>%
mutate(group2 = list(1:3)) %>%
unnest(c(group2))
df1 <- structure(list(Location = c("a", "a", "a", "a", "a", "a", "b",
"b", "b"), year = c(2020L, 2020L, 2020L, 2021L, 2021L, 2021L,
2020L, 2020L, 2020L), group1 = c(1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L), Value = c("x", "y", "z", "x", "y", "z", "x", "y", "z"
)), class = "data.frame", row.names = c(NA, -9L))
Upvotes: 0
Reputation: 3043
Here is the solution you are looking for
library(dplyr)
# 1. Data set
df <- data.table(
location = c("a","a","a","a","a","a","b","b","b"),
year = c(2020,2020,2020,2021,2021,2021,2020,2020,2020),
group1 = c(1,2,3,1,2,3,1,2,3),
value = c("x","y","z","x","y","z","x","y","z"),
stringsAsFactors = FALSE)
# 2. Your code to expand data frame
df <- df[rep(seq_len(nrow(df)), 3), 1:4]
# 3. Arrange
df <- df %>% arrange(location, year, group1, value)
# 4. Add 'group2'
df <- df %>%
group_by(location, year, group1, value) %>%
mutate(group2 = cumsum(group1) / group1) %>%
arrange(location, year, group1, value, group2)
Hope it works
Upvotes: 0
Reputation: 30474
With tidyr
you can use expand
- this will expand your data frame to all combinations of values with your sequence of 1 to 3:
library(tidyverse)
df %>%
group_by(Location, year, group1, Value) %>%
expand(group2 = 1:3)
Output
Location year group1 Value group2
<fct> <dbl> <int> <fct> <int>
1 a 2020 1 x 1
2 a 2020 1 x 2
3 a 2020 1 x 3
4 a 2020 2 y 1
5 a 2020 2 y 2
6 a 2020 2 y 3
...
Your approach looks close, and I suppose you could just add on group2
like this:
cbind(df[rep(seq_len(nrow(df)), each = 3), ], group2 = 1:3)
Upvotes: 1