Nick M.
Nick M.

Reputation: 27

Expand data frame and add a new variable

I have a data frame structured like this: +----------+------+--------+-------+ | Location | year | group1 | Value |
+----------+------+--------+-------+ | a | 2020 | 1 | x | | a | 2020 | 2 | y | | a | 2020 | 3 | z | | a | 2021 | 1 | x | | a | 2021 | 2 | y | | a | 2021 | 3 | z | | b | 2020 | 1 | x | | b | 2020 | 2 | y | | b | 2020 | 3 | z | +----------+------+--------+-------+
I would like to expand the data frame to include 3 rows for every location, year, and group1 combination and generate a group2 variable that identifies these new combinations (1-3). Ideally, the data frame will look like this: +----------+------+--------+-------+--------+ | Location | year | group1 | Value | group2 | +----------+------+--------+-------+--------+ | a | 2020 | 1 | x | 1 | | a | 2020 | 1 | x | 2 | | a | 2020 | 1 | x | 3 | | a | 2020 | 2 | y | 1 | | a | 2020 | 2 | y | 2 | | a | 2020 | 2 | y | 3 | | ... | ... |... |... |... | +----------+------+--------+-------+--------+

I was able to expand the dataframe to the correct number of total rows using the following code:

df[rep(seq_len(nrow(df)),3), 1:4]

But couldn't figure out how to add the group2 variable shown above.

Upvotes: 1

Views: 260

Answers (3)

akrun
akrun

Reputation: 886938

We can use crossing from tidyr

library(tidyr)
library(dplyr)
crossing(df1, group2 = 1:3)
# A tibble: 27 x 5
#   Location  year group1 Value group2
#   <chr>    <int>  <int> <chr>  <int>
# 1 a         2020      1 x          1
# 2 a         2020      1 x          2
# 3 a         2020      1 x          3
# 4 a         2020      2 y          1
# 5 a         2020      2 y          2
# 6 a         2020      2 y          3
# 7 a         2020      3 z          1
# 8 a         2020      3 z          2
# 9 a         2020      3 z          3
#10 a         2021      1 x          1
# … with 17 more rows

Or create a list column and then unnest

df1  %>%
       mutate(group2 = list(1:3)) %>% 
       unnest(c(group2))

data

df1 <- structure(list(Location = c("a", "a", "a", "a", "a", "a", "b", 
"b", "b"), year = c(2020L, 2020L, 2020L, 2021L, 2021L, 2021L, 
2020L, 2020L, 2020L), group1 = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L), Value = c("x", "y", "z", "x", "y", "z", "x", "y", "z"
)), class = "data.frame", row.names = c(NA, -9L))

Upvotes: 0

Andrii
Andrii

Reputation: 3043

Here is the solution you are looking for

library(dplyr)

# 1. Data set
df <- data.table(
  location = c("a","a","a","a","a","a","b","b","b"),
  year = c(2020,2020,2020,2021,2021,2021,2020,2020,2020),
  group1 = c(1,2,3,1,2,3,1,2,3),
  value = c("x","y","z","x","y","z","x","y","z"),
  stringsAsFactors = FALSE)

# 2. Your code to expand data frame
df <- df[rep(seq_len(nrow(df)), 3), 1:4]

# 3. Arrange
df <- df %>% arrange(location, year, group1, value)

# 4. Add 'group2'
df <- df %>% 
  group_by(location, year, group1, value) %>% 
  mutate(group2 = cumsum(group1) / group1) %>% 
  arrange(location, year, group1, value, group2)

Hope it works

Upvotes: 0

Ben
Ben

Reputation: 30474

With tidyr you can use expand - this will expand your data frame to all combinations of values with your sequence of 1 to 3:

library(tidyverse)

df %>%
  group_by(Location, year, group1, Value) %>%
  expand(group2 = 1:3)

Output

   Location  year group1 Value group2
   <fct>    <dbl>  <int> <fct>  <int>
 1 a         2020      1 x          1
 2 a         2020      1 x          2
 3 a         2020      1 x          3
 4 a         2020      2 y          1
 5 a         2020      2 y          2
 6 a         2020      2 y          3
 ...

Your approach looks close, and I suppose you could just add on group2 like this:

cbind(df[rep(seq_len(nrow(df)), each = 3), ], group2 = 1:3)

Upvotes: 1

Related Questions