psysky
psysky

Reputation: 3195

restructured aggregation in R with multiple variables

Here part of mydata

mydat=structure(list(channel_id = c(219038L, 1755L, 1755L, 219038L, 
1755L, 1755L, 1755L, 1755L, 219038L, 1755L, 1755L, 1755L, 219038L, 
1755L, 1755L, 1755L), multifr_type = c(0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), offer_category_id = c(718L, 
718L, 718L, 719L, 718L, 719L, 719L, 718L, 718L, 719L, 1616L, 
718L, 718L, 719L, 720L, 65L), adapter_id = c(3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 45L, 3L, 3L, 3L, 3L, 30L), adapter_id2 = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), 
    airline1 = c(238L, 238L, 238L, 156L, 238L, 156L, 156L, 238L, 
    238L, 156L, 238L, 238L, 238L, 156L, 156L, 757L), airline2 = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
    ), meta_ui_type = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 1L, 1L), offer_flight_type_category_id = c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
    ), discount_category_id = c(1L, 1L, 6L, 1L, 11L, 3L, 1L, 
    6L, 2L, 6L, 1L, 2L, 6L, 2L, 2L, 1L), flight_area = c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
    ), count_sessions = c(15203L, 13297L, 12026L, 10575L, 10459L, 
    10306L, 9632L, 8623L, 7343L, 7298L, 6679L, 6236L, 4180L, 
    4163L, 3986L, 3923L), count_orders = c(907L, 3264L, 2400L, 
    426L, 1830L, 1787L, 1690L, 2119L, 501L, 1503L, 1942L, 1420L, 
    346L, 872L, 1100L, 474L), conversion = c(0.06, 0.245, 0.2, 
    0.04, 0.175, 0.173, 0.175, 0.246, 0.068, 0.206, 0.291, 0.228, 
    0.083, 0.209, 0.276, 0.121), offer_cat_id_clust = c(718L, 
    718L, 718L, 719L, 718L, 719L, 719L, 718L, 718L, 719L, 1616L, 
    718L, 718L, 719L, 720L, 65L)), class = "data.frame", row.names = c(NA, 
-16L))

I need for variables channel_id+multifr_type+adapter_id+adapter_id2+airline1+airline2+meta_ui_type+offer_flight_type_category_id+discount_category_id+flight_area calculate sum for variables count_sessions,count_orders and mean for conversion by each category. Then the result of such aggregation add in mydat2 and for offer_category_id,offer_cat_id_clust set category - "other"

For example, to be more clear, take this category for agregation

channel_id  multifr_type    offer_category_id   adapter_id  adapter_id2 airline1    airline2    meta_ui_type    offer_flight_type_category_id   discount_category_id    flight_area count_sessions  count_orders    conversion  offer_cat_id_clust
219038  0   718 3   0   238 0   0   1   1   1   15203   907 0,06    718
219038  0   718 3   0   238 0   0   1   1   1   13297   3264    0,245   718

aggreagate rows for offer_category_id=718 by mention above variables and instead 718 set "other" so with adding "other" category for offer_category_id,offer_cat_id_clust desired result looks like

*channel_id multifr_type    offer_category_id   adapter_id  adapter_id2 airline1    airline2    meta_ui_type    offer_flight_type_category_id   discount_category_id    flight_area count_sessions  count_orders    conversion  offer_cat_id_clust
219038  0   other   3   0   238 0   0   1   1   1   28500   4171    0,152000    other*

How can i do it such restructured aggregation?

Upvotes: 0

Views: 42

Answers (1)

ekoam
ekoam

Reputation: 8844

Something like this?

library(dplyr)
df %>% 
  group_by(
    channel_id, multifr_type, adapter_id, 
    adapter_id2, airline1, airline2, 
    meta_ui_type, offer_flight_type_category_id, 
    discount_category_id, flight_area
  ) %>% 
  summarise(
    offer_category_id = "other",
    count_sessions = sum(count_sessions), 
    count_orders = sum(count_orders), 
    conversion = mean(conversion), 
    offer_cat_id_clust = "other"
  ) 

Upvotes: 1

Related Questions