BigFinger
BigFinger

Reputation: 1043

How do I do a complex computation on columns and rows of a data table?

I am learning the syntax of manipulating data.table variables. Although I can do simple things, my understanding is not thorough enough for more complex tasks. For example, I would like to transform the following data to have one distinct "type" value per row, separate columns generated based on the value of "subtype", and unique values collapsed when there are multiple rows with the same "type/subtype" combination.

Given the input data:

data = data.frame(
    var1 = c("a","b","c","b","d","e","f"),
    var2 = c("aa","bb","cc","dd","ee","ee","ff"),
    subtype = c("1","2","2","2","1","1","2"),
    type = c("A","A","A","A","B","B","B")
    )

  var1 var2 subtype type
1    a   aa       1    A
2    b   bb       2    A
3    c   cc       2    A
4    b   dd       2    A
5    d   ee       1    B
6    e   ee       1    B
7    f   ff       2    B

I would like to derive:

  1.var1 1.var2 2.var1 2.var2     2.type
A "a"    "aa"   "b|c"  "bb|cc|dd" "A"   
B "d|e"  "ee"   "f"    "ff"       "B"   

Using a data frame, I can achieve this with the following code:

data.derived = do.call(
    rbind,
    lapply(
        split(data,list(data$type)),
        function(x) {
            do.call (
                c,
                lapply(
                    split(x, list(x$subtype)),
                    function(y) {
                        result = c(
                            var1 = paste(unique(y$var1),collapse ="|"),
                            var2 = paste(unique(y$var2),collapse ="|")
                        )
                        if (as.character(y$subtype[1]) == "2") {
                            result = c(result, type = as.character(y$type[1]))
                        }
                        result}))}))

How can I do the same using a data table?

Upvotes: 2

Views: 82

Answers (2)

akuiper
akuiper

Reputation: 214927

From your result, it's clear to see that you are transforming data from long to wide format and subtype is spread along the row direction, so you will need dcast from data.table. And since your want to aggregate your values from var1 and var2 to be a single string, you will need to customize the aggregate function as paste to collapse the result:

library(data.table)
setDT(data)
dcast(data, type ~ subtype, value.var = c("var1", "var2"), 
            fun = function(v) paste0(unique(v), collapse = "|"))

#    type var1_function_1 var1_function_2 var2_function_1 var2_function_2
# 1:    A               a             b|c              aa        bb|cc|dd
# 2:    B             d|e               f              ee              ff

Upvotes: 5

AntoniosK
AntoniosK

Reputation: 16121

Not sure if you want to use data.table package and commands, or if you want to find out whether your code works with data tables as well.

I think complex computations require the usage of the appropriate packages. The above script works for you, but it's hard to see what it does if it's not written by you.

Before you start using data.table check some nice packages that make life easier for you. Like

library(dplyr)
library(tidyr)

data = data.frame(
  var1 = c("a","b","c","b","d","e","f"),
  var2 = c("aa","bb","cc","dd","ee","ee","ff"),
  subtype = c("1","2","2","2","1","1","2"),
  type = c("A","A","A","A","B","B","B")
)

data %>% 
  group_by(type, subtype) %>%
  summarise(x1 = paste(unique(var1),collapse ="|"),
            x2 = paste(unique(var2),collapse ="|")) %>%
  unite(xx,x1,x2) %>%
  spread(subtype,xx) %>%
  separate(`1`, c("1.var1","1.var2"), sep="_") %>%
  separate(`2`, c("2.var1","2.var2"), sep="_") %>%
  ungroup

# # A tibble: 2 x 5
#      type 1.var1 1.var2 2.var1   2.var2
#  * <fctr>  <chr>  <chr>  <chr>    <chr>
# 1      A      a     aa    b|c bb|cc|dd
# 2      B    d|e     ee      f       ff

You can use the same code, or even your script, when having a data table instead of a data frame. But if you're looking for using data table commands that's a different story.

Upvotes: 1

Related Questions