datazang
datazang

Reputation: 1179

How to loop data in R?

Here is a piece of my data:

data_x <- tribble(
  ~price, ~bokey,    ~id,     ~cost,    ~revenue,
     1,     "a",      10,      0.20,       30,      
     2,     "b",      20,      0.30,       60,  
     3,     "c",      20,      0.30,       40,
     4,     "d",      10,      0.20,      100, 
     5,     "e",      30,      0.10,       40,
     6,     "f",      10,      0.20,       10,
     1,     "g",      20,      0.30,       80,
     2 ,    "h",      10,      0.20,       20,
     3,     "h",      30,      0.10,       20,
     3,     "i",      20,      0.30,       40,
)

As you see, there are three different type of IDs: 10, 20, 30. But in the real data, there are almost 100 ids. I want to aggregate the data based on these ids. Because I don't know how to do it in loop, I basically created some subsets:

data_10 <- data_x %>% filter(id == 10)
data_20 <- data_x %>% filter(id == 20)
data_30 <- data_x %>% filter(id == 30)

Here is the aggregated data:

data_agg <- data_10 %>% 
  group_by(priceseg = cut(as.numeric(price), c(0, 1, 3, 5, 6))) %>% 
  summarise(price_n = n_distinct(bokey),
            Cost = sum(cost, na.rm =  T),
            Revenue  =  sum(revenue, na.rm = T),
            clicks = n_distinct(bokey)) %>% 
  mutate(price_n2 = round(100 * prop.table(price_n), 2),
         (zet = Cost/Revenue))

But I want to have one more column that shows the id. Here is the desired data:

data_desired <- tribble(
  ~id, ~priceseg, ~price_n, ~Cost,  ~Revenue,  ~clicks,  ~price_n2, ~`(zet = Cost/Revenue)`
   10,   (0,1]        1      0.2        30         1        25             0.00667
   10,   (1,3]        1      0.2        20         1        25                0.01       
   10,   (3,5]        1      0.2       100         1        25               0.002     
   10,   (5,6]        1      0.2        10         1        25                0.02       
   20,
   20,
   .
   .
)  30,

How can I get it?

Upvotes: 1

Views: 72

Answers (2)

akrun
akrun

Reputation: 887851

An option is to split and loop over with map while specifying the .id

library(dplyr)
library(purrr)
data_x %>% 
     split(.$id) %>%
     map_dfr(~ 
            .x %>%  
                 group_by(priceseg = cut(as.numeric(price), c(0, 1, 3, 5, 6))) %>% 
                 summarise(price_n = n_distinct(bokey),
                           Cost = sum(cost, na.rm =  T),
                           Revenue  =  sum(revenue, na.rm = T),
                           clicks = n_distinct(bokey)) %>% 
                 mutate(price_n2 = round(100 * prop.table(price_n), 2),
      (zet = Cost/Revenue)), .id = "id" )
# A tibble: 8 x 8
#  id    priceseg price_n  Cost Revenue clicks price_n2 `(zet = Cost/Revenue)`
#  <chr> <fct>      <int> <dbl>   <dbl>  <int>    <dbl>                  <dbl>
#1 10    (0,1]          1 0.2        30      1       25                0.00667
#2 10    (1,3]          1 0.2        20      1       25                0.01   
#3 10    (3,5]          1 0.2       100      1       25                0.002  
#4 10    (5,6]          1 0.2        10      1       25                0.02   
#5 20    (0,1]          1 0.3        80      1       25                0.00375
#6 20    (1,3]          3 0.900     140      3       75                0.00643
#7 30    (1,3]          1 0.1        20      1       50                0.005  
#8 30    (3,5]          1 0.1        40      1       50                0.0025 

The cut step can also be changed with findInterval

NOTE: The idea of split/map is based on the OP's title about looping and getting the output

Upvotes: 2

David Jorquera
David Jorquera

Reputation: 2102

Since you are already using dplyr, just add id as one of the grouping variables (no need to previously separate your data):

data_agg <- data_x %>% 
  group_by(id, priceseg = cut(as.numeric(price), c(0, 1, 3, 5, 6))) %>% 
  summarise(price_n = n_distinct(bokey),
        Cost = sum(cost, na.rm =  T),
        Revenue  =  sum(revenue, na.rm = T),
        clicks = n_distinct(bokey)) %>% 
  mutate(price_n2 = round(100 * prop.table(price_n), 2),
     (zet = Cost/Revenue))



# A tibble: 8 x 8
# Groups:   id [3]
#      id priceseg price_n  Cost Revenue clicks price_n2 `(zet = Cost/Revenue)`
#   <dbl> <fct>      <int> <dbl>   <dbl>  <int>    <dbl>                  <dbl>
# 1    10 (0,1]          1 0.2        30      1       25                0.00667
# 2    10 (1,3]          1 0.2        20      1       25                0.01   
# 3    10 (3,5]          1 0.2       100      1       25                0.002  
# 4    10 (5,6]          1 0.2        10      1       25                0.02   
# 5    20 (0,1]          1 0.3        80      1       25                0.00375
# 6    20 (1,3]          3 0.900     140      3       75                0.00643
# 7    30 (1,3]          1 0.1        20      1       50                0.005  
# 8    30 (3,5]          1 0.1        40      1       50                0.0025 

Upvotes: 2

Related Questions