user3067851
user3067851

Reputation: 534

reshape and aggregate datatable

I asked a very similar question and because I haven't quite gotten a handle on tidyr or reshape I have to ask another question. I have a datatable containing repeat id values (see below):

id  Product NI  
1   Direct Auto 15  
2   Direct Auto 15  
3   Direct Auto 15  
4   Direct Auto 15  
5   Direct Auto 15  
6   Direct Auto 15  
6   Mortgage    50  
9   Direct Auto 15  
10  Direct Auto 15  
11  Direct Auto 15  
12  Direct Auto 15  
13  Direct Auto 15  
14  Direct Auto 15  
15  Direct Auto 15  
16  Direct Auto 15  
1   Mortgage    50  
5   Personal    110  
19  Direct Auto 15  
20  Direct Auto 15  
1   Direct Auto 15 

I would like the id aggregated to one row, the Product column to be 'spread' so that the values become variables, another variable containing the aggregated count of each Product by id, and the NI to be summed for each of the product groups by ID. So see example below:

id  DirectAuto  DA_NI   Mortgage    Mortgage_NI Personal    P_NI  
1   2   30  1   50  NA  NA  
2   1   15  NA  NA  NA  NA  
3   1   15  NA  NA  NA  NA  
4   1   15  NA  NA  NA  NA  
5   1   15  NA  NA  1   110  
6   1   15  1   50  NA  NA  
9   1   15  NA  NA  NA  NA  
11  1   15  NA  NA  NA  NA  
12  1   15  NA  NA  NA  NA  
13  1   15  NA  NA  NA  NA  
14  1   15  NA  NA  NA  NA  
15  1   15  NA  NA  NA  NA  
16  1   15  NA  NA  NA  NA  
19  1   15  NA  NA  NA  NA  
20  1   15  NA  NA  NA  NA 

For example, id 1 has 2 Direct Auto, so his DA_NI is 30 and he has 1 Mortgage so his NI is Mortgage_NI = 50.

So, basically make a 'wider' datatable. I'm still reading and practicing tidyr and reshape, but in the mean-time maybe someone can help.
Here is some of my starting code:

df[, .(tot = .N, NI = sum(NI)), by = c("id","Product")]

Afterwards, using some tidyr & reshape commands I can't seem to get the final output I want.

Upvotes: 1

Views: 117

Answers (2)

Arun
Arun

Reputation: 118779

data.table v1.9.5 has more nicer features for melting and casting. Using dcast from the devel version:

require(data.table) # v1.9.5
dcast(dt, id ~ Product, fun.agg = list(sum, length), value.var="NI", fill=NA)

I think this is what you're looking for. You can checkout the new HTML vignettes here.

Rename the columns to your liking.

Upvotes: 1

Nick Kennedy
Nick Kennedy

Reputation: 12640

It's a little tricky to do this. It can be done using tidyr and dplyr though it goes against Hadley Wickgam's tidy data principles.

dat %>%
  group_by(id, Product) %>%
  summarise(NI = sum(NI), n = n()) %>%
  gather(variable, value, n, NI) %>%
  mutate(
    col_name = ifelse(variable == "n",
      as.character(Product),
      paste(Product, variable, sep = "_"))
  ) %>%
  select(-c(Product, variable)) %>%
  spread(col_name, value)

Upvotes: 0

Related Questions