Reputation: 3195
Here part of mydata
mydat=structure(list(channel_id = c(219038L, 1755L, 1755L, 219038L,
1755L, 1755L, 1755L, 1755L, 219038L, 1755L, 1755L, 1755L, 219038L,
1755L, 1755L, 1755L), multifr_type = c(0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), offer_category_id = c(718L,
718L, 718L, 719L, 718L, 719L, 719L, 718L, 718L, 719L, 1616L,
718L, 718L, 719L, 720L, 65L), adapter_id = c(3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 45L, 3L, 3L, 3L, 3L, 30L), adapter_id2 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
airline1 = c(238L, 238L, 238L, 156L, 238L, 156L, 156L, 238L,
238L, 156L, 238L, 238L, 238L, 156L, 156L, 757L), airline2 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
), meta_ui_type = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 1L), offer_flight_type_category_id = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), discount_category_id = c(1L, 1L, 6L, 1L, 11L, 3L, 1L,
6L, 2L, 6L, 1L, 2L, 6L, 2L, 2L, 1L), flight_area = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), count_sessions = c(15203L, 13297L, 12026L, 10575L, 10459L,
10306L, 9632L, 8623L, 7343L, 7298L, 6679L, 6236L, 4180L,
4163L, 3986L, 3923L), count_orders = c(907L, 3264L, 2400L,
426L, 1830L, 1787L, 1690L, 2119L, 501L, 1503L, 1942L, 1420L,
346L, 872L, 1100L, 474L), conversion = c(0.06, 0.245, 0.2,
0.04, 0.175, 0.173, 0.175, 0.246, 0.068, 0.206, 0.291, 0.228,
0.083, 0.209, 0.276, 0.121), offer_cat_id_clust = c(718L,
718L, 718L, 719L, 718L, 719L, 719L, 718L, 718L, 719L, 1616L,
718L, 718L, 719L, 720L, 65L)), class = "data.frame", row.names = c(NA,
-16L))
I need for variables channel_id+multifr_type+adapter_id+adapter_id2+airline1+airline2+meta_ui_type+offer_flight_type_category_id+discount_category_id+flight_area
calculate sum for variables count_sessions,count_orders
and mean for conversion
by each category.
Then the result of such aggregation add in mydat2
and for offer_category_id,offer_cat_id_clust
set category - "other"
For example, to be more clear, take this category for agregation
channel_id multifr_type offer_category_id adapter_id adapter_id2 airline1 airline2 meta_ui_type offer_flight_type_category_id discount_category_id flight_area count_sessions count_orders conversion offer_cat_id_clust
219038 0 718 3 0 238 0 0 1 1 1 15203 907 0,06 718
219038 0 718 3 0 238 0 0 1 1 1 13297 3264 0,245 718
aggreagate rows for offer_category_id=718
by mention above variables and instead 718 set "other"
so with adding "other" category for offer_category_id,offer_cat_id_clust
desired result looks like
*channel_id multifr_type offer_category_id adapter_id adapter_id2 airline1 airline2 meta_ui_type offer_flight_type_category_id discount_category_id flight_area count_sessions count_orders conversion offer_cat_id_clust
219038 0 other 3 0 238 0 0 1 1 1 28500 4171 0,152000 other*
How can i do it such restructured aggregation?
Upvotes: 0
Views: 42
Reputation: 8844
Something like this?
library(dplyr)
df %>%
group_by(
channel_id, multifr_type, adapter_id,
adapter_id2, airline1, airline2,
meta_ui_type, offer_flight_type_category_id,
discount_category_id, flight_area
) %>%
summarise(
offer_category_id = "other",
count_sessions = sum(count_sessions),
count_orders = sum(count_orders),
conversion = mean(conversion),
offer_cat_id_clust = "other"
)
Upvotes: 1