Reputation: 18426
Let's consider following snippet which has a data.table
, and I'm calculating some aggregates, and I've added the sample output as well:
data <- data.table(col=sample(c("col1", "col2", "col3"), 20, replace=TRUE),
id=as.character(sample(1:5, 20, replace=TRUE)),
sex=sample(c("m", "f"), 20, replace=TRUE),
val=sample.int(100, 20)
)
res <- data[,.(n=.N, mean=mean(val), total=sum(val)), by=.(col, id, sex)]
# res
A data.table: 13 × 6
col id sex n mean total
<chr> <chr> <chr> <int> <dbl> <int>
col2 5 f 3 39.33333 118
col1 3 f 2 18.50000 37
col3 5 f 1 7.00000 7
col2 4 f 1 46.00000 46
col1 2 f 2 29.00000 58
col1 5 f 2 37.00000 74
col2 1 m 1 37.00000 37
col1 3 m 2 29.50000 59
col1 4 m 1 26.00000 26
col2 4 m 2 54.00000 108
col3 5 m 1 60.00000 60
col2 2 f 1 70.00000 70
col2 5 m 1 29.00000 29
After calculating the aggregate, I'm pivoting the data.table on col
column using dcast
which results something like below:
dcast(res, id+sex~col, value.var=c("n", "mean", "total"))
#output:
A data.table: 9 × 11
id sex n_col1 n_col2 n_col3 mean_col1 mean_col2 mean_col3 total_col1 total_col2 total_col3
<chr> <chr> <int> <int> <int> <dbl> <dbl> <dbl> <int> <int> <int>
1 f NA 1 1 NA 31 32.0 NA 31 32
1 m 1 1 NA 25 12 NA 25 12 NA
2 f 2 1 NA 48 51 NA 96 51 NA
2 m 1 2 NA 96 80 NA 96 160 NA
3 m NA NA 2 NA NA 40.5 NA NA 81
4 f 2 1 NA 43 22 NA 86 22 NA
4 m NA 2 NA NA 43 NA NA 86 NA
5 f 1 1 NA 19 85 NA 19 85 NA
5 m 1 NA NA 23 NA NA 23 NA NA
Now, the question is, how can we stack id
, and sex
columns together to create a single column, with additional level
column which tells the level for the row along with NA
values for outer column rows (indicated by <- below), following is the output I'm looking for:
newCol level n_col1 n_col2 n_col3 mean_col1 mean_col2 mean_col3 total_col1 total_col2 total_col3
<chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <int> <int> <int>
1 1 NA NA NA NA NA NA NA NA NA <-
f 2 NA 1 1 NA 31 32.0 NA 31 32
m 2 1 1 NA 25 12 NA 25 12 NA
2 1 NA NA NA NA NA NA NA NA NA <-
f 2 2 1 NA 48 51 NA 96 51 NA
m 2 1 2 NA 96 80 NA 96 160 NA
3 1 NA NA NA NA NA NA NA NA NA <-
m 2 NA NA 2 NA NA 40.5 NA NA 81
4 1 NA NA NA NA NA NA NA NA NA <-
f 2 2 1 NA 43 22 NA 86 22 NA
m 2 NA 2 NA NA 43 NA NA 86 NA
5 1 NA NA NA NA NA NA NA NA NA <-
f 2 1 1 NA 19 85 NA 19 85 NA
m 2 1 NA NA 23 NA NA 23 NA NA
Ideally, there can be more columns to stack and level can go up to 5. for two levels 1, and 2, NA
rows are required only for level 1, but for 4 levels (meaning 4 stacking columns), NA
rows are required for level 1, 2, and 3.
The straight forward appraoch is to iterate the unique values, and keep joining the dataframes which is ofcourse not going to be an efficient approach. You can find the Python pandas based question and answer for the above problem at: Pandas stack multiple columns to a single column
Upvotes: 2
Views: 341
Reputation: 27732
# store your dcast result
res.wide <- dcast(res, id+sex~col, value.var=c("n", "mean", "total"))
# split by id
L <- split(res.wide, by = "id", keep.by = FALSE)
# rbind empy row
rbindlist(
lapply(seq.int(L),
function(x) rbind(data.table(sex = names(L)[[x]]),
L[[x]],
fill = TRUE))
)
# sex n_col1 n_col2 n_col3 mean_col1 mean_col2 mean_col3 total_col1 total_col2 total_col3
# 1: 1 NA NA NA NA NA NA NA NA NA
# 2: f 1 NA 1 23.00000 NA 64.00000 23 NA 64
# 3: m 3 NA NA 87.33333 NA NA 262 NA NA
# 4: 2 NA NA NA NA NA NA NA NA NA
# 5: f NA 1 1 NA 87 31.00000 NA 87 31
# 6: m 1 NA NA 79.00000 NA NA 79 NA NA
# 7: 3 NA NA NA NA NA NA NA NA NA
# 8: f NA 2 NA NA 23 NA NA 46 NA
# 9: m 1 NA NA 25.00000 NA NA 25 NA NA
#10: 4 NA NA NA NA NA NA NA NA NA
#11: f NA NA 1 NA NA 52.00000 NA NA 52
#12: m 1 1 NA 30.00000 37 NA 30 37 NA
#13: 5 NA NA NA NA NA NA NA NA NA
#14: f NA 2 3 NA 31 63.66667 NA 62 191
#15: m NA NA 1 NA NA 12.00000 NA NA 12
Upvotes: 4