Prometheus
Prometheus

Reputation: 2017

dcast with data.table in a function

I am trying to turn the code below, which already works, into a function.

A similar situation, dcast + DT, has already been disscused here! But i've wasnt able to solve the problem like that.

What I want to achieve is:

This is the code that works already:

result1 <- dcast(setDT(data), customer_id ~ paste0("num_of_oranges",period), value.var = "num_of_oranges", sum)
result2 <- dcast(setDT(data), customer_id ~ paste0("num_of_oranges",period) + paste0("SIGN_",sign), value.var = "num_of_oranges", sum)
result3 <- dcast(setDT(data), customer_id ~ paste0("num_of_oranges",period) + paste0("SIGN_",sign) + paste0("ORIGIN_",origin), value.var = "num_of_oranges", sum)

My attempt towards the function:

create.Feature <- function(col1, stat) {

  test1 <- dcast(df, df[[id]] ~ paste0("col1",df[[period]]), value.var = df[["col1"]], stat)  
 return(test1)
  test2 <- dcast(df, df[[id]] ~ paste0("col1",df[[period]]) + paste0("SIGN",df[[sign]]), value.var = df[["col1"]], stat)
  return(test2)
  test3 <- dcast(df, df[[id]] ~ paste0("col1",df[[period]]) + paste0("SIGN",df[[sign]]) + paste0("ORIGIN",df[[origin]]), value.var = df[["col1"]], stat)
  return(test3)

And the call:

test_result <- create.Feature("num_of_oranges", sum)

I get the following error: Error in .subset2(x, i, exact = exact) : no such index at level 1

Anyone?

Upvotes: 0

Views: 1193

Answers (1)

lao_zhang
lao_zhang

Reputation: 178

I tried using the mtcars dataset to reproduce your function.

Code:

cars <- mtcars

result1 <- dcast(setDT(cars), cyl ~ paste0("disp", gear), 
                 value.var = "disp", 
                 sum)
result2 <- dcast(setDT(cars), cyl ~ paste0("disp", gear) + 
                       paste0("am", am),
                 value.var = "disp", 
                 sum)
result3 <- dcast(setDT(cars), cyl ~ paste0("disp", gear) + 
                       paste0("am", am) +
                       paste0("vs", vs),
                 value.var = "disp", 
                 sum)

create.Feature <- function(df, id, col1) {
      test1 <- dcast(df,
                     df[[id]] ~ paste0(col1, df[["gear"]]),
                     value.var = col1,
                     sum)
      test2 <- dcast(df,
                     df[[id]] ~ paste0(col1, df[["gear"]]) + 
                           paste0("am", df[["am"]]),
                     value.var = col1,
                     sum)
      test3 <- dcast(df,
                     df[[id]] ~ paste0(col1, df[["gear"]]) +
                           paste0("am", df[["am"]]) +
                           paste0("vs", df[["vs"]]),
                     value.var = col1,
                     sum)
      list(test1, test2, test3)
}

tr <- create.Feature(df = cars, 
                     id = "cyl", 
                     col1 = "disp")

Output:

tr
[[1]]
   df  disp3 disp4 disp5
1:  4  120.1 821.0 215.4
2:  6  483.0 655.2 145.0
3:  8 4291.4   0.0 652.0

[[2]]
   df disp3_am0 disp4_am0 disp4_am1 disp5_am1
1:  4     120.1     287.5     533.5     215.4
2:  6     483.0     335.2     320.0     145.0
3:  8    4291.4       0.0       0.0     652.0

[[3]]
   df disp3_am0_vs0 disp3_am0_vs1 disp4_am0_vs1 disp4_am1_vs0
1:  4           0.0         120.1         287.5             0
2:  6           0.0         483.0         335.2           320
3:  8        4291.4           0.0           0.0             0
   disp4_am1_vs1 disp5_am1_vs0 disp5_am1_vs1
1:         533.5         120.3          95.1
2:           0.0         145.0           0.0
3:           0.0         652.0           0.0

A few points though:

  1. You hard-coded some of the variables into the function (I assume), e.g. df[[sign]] and df[[origin]], which I did the same.
  2. I can't seem to get the stat into the function, that's why I added sum into the function instead of stat. I can't figure out what is the problem. I tried match.fun() and do.call, just can't seem to get it to work.
  3. In your function, test3 was the last statement, I assumed you want all three test1, test2 and test3, so I combined them into a list and let that be the output (last statement).

Not sure if this is what you want, if not, hope you'll get it soon. I personally don't use data.table, I use more of dplyr.

Upvotes: 1

Related Questions