Saul Garcia
Saul Garcia

Reputation: 940

Summarizing and creating new variable according to condition in R

I made this dummy dataset:

df = data.frame(Order = "Order1", 
       Condition = c("P", "A", "B", "C", "D", "E", "F"),
       Value = c(500, -10, -5,0,0, -10,0))

Supposing that som of the conditions belong to different groups.

list = list( Group1 = c("A", "B"), 
      Group2 = c("C", "D"), 
      Group3 = c("E","F"))

I need to aggregate them by condition where I get the sum and the count for each group.

Expected output:

Order   P   Group1 Group2 Group3 Group1n Group2n Group3n
Order1 500   -15     0      -10    2        0       1

I am was thinking something like:

df %>% group_by(Order) %>% summarise(Group1 = sum(Value[Condition == "A" | Condition == "B" ]),
                                 Group2 = sum(Value[Condition == "C" | Condition == "D" ] ),
                                 Group3 = sum(Value[Condition == "E" | Condition == "F"]),
                                 Group1n = length(Value[Condition == "A" | Condition == "B" ]),
                                 Group2n = length(Value[Condition == "C" | Condition == "D" ]),
                                 Group3n = length(Value[Condition == "E" | Condition == "F" ])) 

My output:

  # A tibble: 1 x 7
Order  Group1 Group2 Group3 Group1n Group2n Group3n
 <fct>   <dbl>  <dbl>  <dbl>   <dbl>   <int>   <int>
 Order1  -15.0      0  -10.0    2       2       2

But I am not able to have the countings right.. Also is there an efficient way where I can pass up the list and not writing explicitly Condition ==A or B ... etc.

Thank you

Upvotes: 0

Views: 47

Answers (2)

Gregor Thomas
Gregor Thomas

Reputation: 145765

I'd recommend this for a tidy solution:

group_map = data.frame(Condition = unlist(list), Group = rep(names(list), lengths(list)), stringsAsFactors = FALSE)

result = df %>% mutate(Condition = as.character(Condition)) %>%
    inner_join(group_map) %>%
    group_by(Group, Order) %>%
    summarize(sums = sum(Value), n_nonzero = sum(Value != 0))

result
# # A tibble: 3 x 4
# # Groups:   Group [?]
#    Group  Order  sums n_nonzero
#    <chr> <fctr> <dbl>     <int>
# 1 Group1 Order1   -15         2
# 2 Group2 Order1     0         0
# 3 Group3 Order1   -10         1

If you need the wide format, you can reshape multiple columns with data.table:

library(data.table)
setDT(result)
data.table::dcast(Order ~ Group, data = result, value.var = c("sums", "n_nonzero"))
#     Order sums_Group1 sums_Group2 sums_Group3 n_nonzero_Group1 n_nonzero_Group2 n_nonzero_Group3
# 1: Order1         -15           0         -10                2                0                1

Upvotes: 1

Melissa Key
Melissa Key

Reputation: 4551

This should give you what you want:

df %>% 
  group_by(Order) %>% 
  summarise(Group1 = sum(Value[Condition == "A" | Condition == "B" ]),
            Group2 = sum(Value[Condition == "C" | Condition == "D" ] ),
            Group3 = sum(Value[Condition == "E" | Condition == "F"]),
            Group1n = sum(Condition == "A" | Condition == "B"),
            Group2n = sum(Condition == "C" | Condition == "D"),
            Group3n = sum(Condition == "E" | Condition == "F"))

You can clean this up a bit more. This version is also counting only non-zero values (as opposed to all rows within each group.)

# don't rename "list"
list_of_groups = list( Group1 = c("A", "B"), 
  Group2 = c("C", "D"), 
  Group3 = c("E","F"))
df %>% 
  group_by(Order) %>% 
  summarise(Group1 = sum(Value[Condition %in% list_of_groups$Group1]),
            Group2 = sum(Value[Condition %in% list_of_groups$Group2] ),
            Group3 = sum(Value[Condition %in% list_of_groups$Group3]),
            Group1n = sum(Condition %in% list_of_groups$Group1 & Value != 0),
            Group2n = sum(Condition %in% list_of_groups$Group2 & Value != 0),
            Group3n = sum(Condition %in% list_of_groups$Group3 & Value != 0))

Taking advantage of your list of groups - that way you don't have to fix everything if you change your groups (if that is relevant).

Upvotes: 2

Related Questions