Reputation: 534
I asked a very similar question and because I haven't quite gotten a handle on tidyr or reshape I have to ask another question. I have a datatable containing repeat id values (see below):
id Product NI
1 Direct Auto 15
2 Direct Auto 15
3 Direct Auto 15
4 Direct Auto 15
5 Direct Auto 15
6 Direct Auto 15
6 Mortgage 50
9 Direct Auto 15
10 Direct Auto 15
11 Direct Auto 15
12 Direct Auto 15
13 Direct Auto 15
14 Direct Auto 15
15 Direct Auto 15
16 Direct Auto 15
1 Mortgage 50
5 Personal 110
19 Direct Auto 15
20 Direct Auto 15
1 Direct Auto 15
I would like the id aggregated to one row, the Product column to be 'spread' so that the values become variables, another variable containing the aggregated count of each Product by id, and the NI to be summed for each of the product groups by ID. So see example below:
id DirectAuto DA_NI Mortgage Mortgage_NI Personal P_NI
1 2 30 1 50 NA NA
2 1 15 NA NA NA NA
3 1 15 NA NA NA NA
4 1 15 NA NA NA NA
5 1 15 NA NA 1 110
6 1 15 1 50 NA NA
9 1 15 NA NA NA NA
11 1 15 NA NA NA NA
12 1 15 NA NA NA NA
13 1 15 NA NA NA NA
14 1 15 NA NA NA NA
15 1 15 NA NA NA NA
16 1 15 NA NA NA NA
19 1 15 NA NA NA NA
20 1 15 NA NA NA NA
For example, id 1 has 2 Direct Auto, so his DA_NI is 30 and he has 1 Mortgage so his NI is Mortgage_NI = 50.
So, basically make a 'wider' datatable. I'm still reading and practicing tidyr and reshape, but in the mean-time maybe someone can help.
Here is some of my starting code:
df[, .(tot = .N, NI = sum(NI)), by = c("id","Product")]
Afterwards, using some tidyr & reshape commands I can't seem to get the final output I want.
Upvotes: 1
Views: 117
Reputation: 118779
data.table v1.9.5
has more nicer features for melting and casting. Using dcast
from the devel version:
require(data.table) # v1.9.5
dcast(dt, id ~ Product, fun.agg = list(sum, length), value.var="NI", fill=NA)
I think this is what you're looking for. You can checkout the new HTML vignettes here.
Rename the columns to your liking.
Upvotes: 1
Reputation: 12640
It's a little tricky to do this. It can be done using tidyr
and dplyr
though it goes against Hadley Wickgam's tidy data principles.
dat %>%
group_by(id, Product) %>%
summarise(NI = sum(NI), n = n()) %>%
gather(variable, value, n, NI) %>%
mutate(
col_name = ifelse(variable == "n",
as.character(Product),
paste(Product, variable, sep = "_"))
) %>%
select(-c(Product, variable)) %>%
spread(col_name, value)
Upvotes: 0