Reputation: 3071
Let's say I have a data.frame/table which I want to "expand", such that each row transforms into a sequence of repeat rows (which can be described as a group), except for a new column which will have the same values in all groups.
So, taking the following dataframe, we want to expand it by column id
, such that each row is now repeated, but with a new column q
which repeats from 1-4 for each group.
The way I've found to do this is to build a second dataframe based on the first one's column id
and with the new column values, and then joining it to the original:
library(dplyr)
set.seed(42)
a <- data.frame(id = 1:5,
value = runif(5))
a
#> id value
#> 1 1 0.9148060
#> 2 2 0.9370754
#> 3 3 0.2861395
#> 4 4 0.8304476
#> 5 5 0.6417455
b <- data.frame(id = rep(a$id, each = 4),
q = 1:4)
left_join(a, b, by = "id")
#> id value q
#> 1 1 0.9148060 1
#> 2 1 0.9148060 2
#> 3 1 0.9148060 3
#> 4 1 0.9148060 4
#> 5 2 0.9370754 1
#> 6 2 0.9370754 2
#> 7 2 0.9370754 3
#> 8 2 0.9370754 4
#> 9 3 0.2861395 1
#> 10 3 0.2861395 2
#> 11 3 0.2861395 3
#> 12 3 0.2861395 4
#> 13 4 0.8304476 1
#> 14 4 0.8304476 2
#> 15 4 0.8304476 3
#> 16 4 0.8304476 4
#> 17 5 0.6417455 1
#> 18 5 0.6417455 2
#> 19 5 0.6417455 3
#> 20 5 0.6417455 4
Created on 2020-01-27 by the reprex package (v0.3.0)
Is there a more direct way of doing this? In my example above I've used dplyr
, but I can just as easily adopt data.table
syntax if that's easier.
Upvotes: 3
Views: 286
Reputation: 2164
This is easily accomplished with the function crossing
from the tidyr
package.
library(tidyr)
a <- tibble(
id = 1:5, value = runif(5)
)
crossing(a, q = 1:4)
#> # A tibble: 20 x 3
#> id value q
#> <int> <dbl> <int>
#> 1 1 0.222 1
#> 2 1 0.222 2
#> 3 1 0.222 3
#> 4 1 0.222 4
#> 5 2 0.262 1
#> 6 2 0.262 2
#> 7 2 0.262 3
#> 8 2 0.262 4
#> 9 3 0.284 1
#> # … with 10 more rows
Created on 2020-01-27 by the reprex package (v0.3.0)
Upvotes: 3
Reputation: 33743
A data.table
alternative:
setDT(a)
a[CJ(id = id, q = 1:4), on = .(id)]
Base R
:
expand_grid(a, q = 1:4)
Upvotes: 2
Reputation: 887951
If we need to rep
licate the rows, it can be easily done with uncount
library(dplyr)
library(tidyr)
library(data.table)
uncount(a, 4) %>%
mutate(q = rowid(id))
Or another option is create a list
column of 'q' and then unnest
a %>%
mutate(q = list(1:4)) %>%
unnest(q)
Or in base R
transform(a[rep(seq_len(nrow(a)), 4), ], q = sequence(table(id)))
Or using data.table
library(data.table)
setDT(a)[, .(q = 1:4),.(id, value)]
Or replicate the rows first
setDT(a)[rep(seq_len(.N), .N)][, q := rep(1:4, length.out = .N)][]
Upvotes: 3