Reputation: 1179
Here is a piece of my data:
data_x <- tribble(
~price, ~bokey, ~id, ~cost, ~revenue,
1, "a", 10, 0.20, 30,
2, "b", 20, 0.30, 60,
3, "c", 20, 0.30, 40,
4, "d", 10, 0.20, 100,
5, "e", 30, 0.10, 40,
6, "f", 10, 0.20, 10,
1, "g", 20, 0.30, 80,
2 , "h", 10, 0.20, 20,
3, "h", 30, 0.10, 20,
3, "i", 20, 0.30, 40,
)
As you see, there are three different type of IDs: 10, 20, 30. But in the real data, there are almost 100 ids. I want to aggregate the data based on these ids. Because I don't know how to do it in loop, I basically created some subsets:
data_10 <- data_x %>% filter(id == 10)
data_20 <- data_x %>% filter(id == 20)
data_30 <- data_x %>% filter(id == 30)
Here is the aggregated data:
data_agg <- data_10 %>%
group_by(priceseg = cut(as.numeric(price), c(0, 1, 3, 5, 6))) %>%
summarise(price_n = n_distinct(bokey),
Cost = sum(cost, na.rm = T),
Revenue = sum(revenue, na.rm = T),
clicks = n_distinct(bokey)) %>%
mutate(price_n2 = round(100 * prop.table(price_n), 2),
(zet = Cost/Revenue))
But I want to have one more column that shows the id. Here is the desired data:
data_desired <- tribble(
~id, ~priceseg, ~price_n, ~Cost, ~Revenue, ~clicks, ~price_n2, ~`(zet = Cost/Revenue)`
10, (0,1] 1 0.2 30 1 25 0.00667
10, (1,3] 1 0.2 20 1 25 0.01
10, (3,5] 1 0.2 100 1 25 0.002
10, (5,6] 1 0.2 10 1 25 0.02
20,
20,
.
.
) 30,
How can I get it?
Upvotes: 1
Views: 72
Reputation: 887851
An option is to split
and loop over with map
while specifying the .id
library(dplyr)
library(purrr)
data_x %>%
split(.$id) %>%
map_dfr(~
.x %>%
group_by(priceseg = cut(as.numeric(price), c(0, 1, 3, 5, 6))) %>%
summarise(price_n = n_distinct(bokey),
Cost = sum(cost, na.rm = T),
Revenue = sum(revenue, na.rm = T),
clicks = n_distinct(bokey)) %>%
mutate(price_n2 = round(100 * prop.table(price_n), 2),
(zet = Cost/Revenue)), .id = "id" )
# A tibble: 8 x 8
# id priceseg price_n Cost Revenue clicks price_n2 `(zet = Cost/Revenue)`
# <chr> <fct> <int> <dbl> <dbl> <int> <dbl> <dbl>
#1 10 (0,1] 1 0.2 30 1 25 0.00667
#2 10 (1,3] 1 0.2 20 1 25 0.01
#3 10 (3,5] 1 0.2 100 1 25 0.002
#4 10 (5,6] 1 0.2 10 1 25 0.02
#5 20 (0,1] 1 0.3 80 1 25 0.00375
#6 20 (1,3] 3 0.900 140 3 75 0.00643
#7 30 (1,3] 1 0.1 20 1 50 0.005
#8 30 (3,5] 1 0.1 40 1 50 0.0025
The cut
step can also be changed with findInterval
NOTE: The idea of split/map
is based on the OP's title about looping and getting the output
Upvotes: 2
Reputation: 2102
Since you are already using dplyr
, just add id
as one of the grouping variables (no need to previously separate your data):
data_agg <- data_x %>%
group_by(id, priceseg = cut(as.numeric(price), c(0, 1, 3, 5, 6))) %>%
summarise(price_n = n_distinct(bokey),
Cost = sum(cost, na.rm = T),
Revenue = sum(revenue, na.rm = T),
clicks = n_distinct(bokey)) %>%
mutate(price_n2 = round(100 * prop.table(price_n), 2),
(zet = Cost/Revenue))
# A tibble: 8 x 8
# Groups: id [3]
# id priceseg price_n Cost Revenue clicks price_n2 `(zet = Cost/Revenue)`
# <dbl> <fct> <int> <dbl> <dbl> <int> <dbl> <dbl>
# 1 10 (0,1] 1 0.2 30 1 25 0.00667
# 2 10 (1,3] 1 0.2 20 1 25 0.01
# 3 10 (3,5] 1 0.2 100 1 25 0.002
# 4 10 (5,6] 1 0.2 10 1 25 0.02
# 5 20 (0,1] 1 0.3 80 1 25 0.00375
# 6 20 (1,3] 3 0.900 140 3 75 0.00643
# 7 30 (1,3] 1 0.1 20 1 50 0.005
# 8 30 (3,5] 1 0.1 40 1 50 0.0025
Upvotes: 2