Reputation: 1043
I am learning the syntax of manipulating data.table variables. Although I can do simple things, my understanding is not thorough enough for more complex tasks. For example, I would like to transform the following data to have one distinct "type" value per row, separate columns generated based on the value of "subtype", and unique values collapsed when there are multiple rows with the same "type/subtype" combination.
Given the input data:
data = data.frame(
var1 = c("a","b","c","b","d","e","f"),
var2 = c("aa","bb","cc","dd","ee","ee","ff"),
subtype = c("1","2","2","2","1","1","2"),
type = c("A","A","A","A","B","B","B")
)
var1 var2 subtype type
1 a aa 1 A
2 b bb 2 A
3 c cc 2 A
4 b dd 2 A
5 d ee 1 B
6 e ee 1 B
7 f ff 2 B
I would like to derive:
1.var1 1.var2 2.var1 2.var2 2.type
A "a" "aa" "b|c" "bb|cc|dd" "A"
B "d|e" "ee" "f" "ff" "B"
Using a data frame, I can achieve this with the following code:
data.derived = do.call(
rbind,
lapply(
split(data,list(data$type)),
function(x) {
do.call (
c,
lapply(
split(x, list(x$subtype)),
function(y) {
result = c(
var1 = paste(unique(y$var1),collapse ="|"),
var2 = paste(unique(y$var2),collapse ="|")
)
if (as.character(y$subtype[1]) == "2") {
result = c(result, type = as.character(y$type[1]))
}
result}))}))
How can I do the same using a data table?
Upvotes: 2
Views: 82
Reputation: 214927
From your result, it's clear to see that you are transforming data from long to wide format and subtype is spread along the row direction, so you will need dcast
from data.table
. And since your want to aggregate your values from var1
and var2
to be a single string, you will need to customize the aggregate function as paste
to collapse the result:
library(data.table)
setDT(data)
dcast(data, type ~ subtype, value.var = c("var1", "var2"),
fun = function(v) paste0(unique(v), collapse = "|"))
# type var1_function_1 var1_function_2 var2_function_1 var2_function_2
# 1: A a b|c aa bb|cc|dd
# 2: B d|e f ee ff
Upvotes: 5
Reputation: 16121
Not sure if you want to use data.table package and commands, or if you want to find out whether your code works with data tables as well.
I think complex computations require the usage of the appropriate packages. The above script works for you, but it's hard to see what it does if it's not written by you.
Before you start using data.table check some nice packages that make life easier for you. Like
library(dplyr)
library(tidyr)
data = data.frame(
var1 = c("a","b","c","b","d","e","f"),
var2 = c("aa","bb","cc","dd","ee","ee","ff"),
subtype = c("1","2","2","2","1","1","2"),
type = c("A","A","A","A","B","B","B")
)
data %>%
group_by(type, subtype) %>%
summarise(x1 = paste(unique(var1),collapse ="|"),
x2 = paste(unique(var2),collapse ="|")) %>%
unite(xx,x1,x2) %>%
spread(subtype,xx) %>%
separate(`1`, c("1.var1","1.var2"), sep="_") %>%
separate(`2`, c("2.var1","2.var2"), sep="_") %>%
ungroup
# # A tibble: 2 x 5
# type 1.var1 1.var2 2.var1 2.var2
# * <fctr> <chr> <chr> <chr> <chr>
# 1 A a aa b|c bb|cc|dd
# 2 B d|e ee f ff
You can use the same code, or even your script, when having a data table instead of a data frame. But if you're looking for using data table commands that's a different story.
Upvotes: 1