Matt W.
Matt W.

Reputation: 3722

conditional summarising/mutate in dplyr

Trying to conditionally sum based on the previous groupings. Having trouble coming up with it.

I'm trying to sum the amt column based on which ones are in type r1, after grouping by f.

Reproducible code:

s <- sample(c('one', 'two'), 96, replace = TRUE)
f <- sample(c('a','s','d','f'), 96, replace = TRUE)
r1_amt <- runif(96, 1, 100)
r2_amt <- runif(96, 1, 100)
r3_amt <- runif(96, 1, 100)
x <- data_frame(s, f, r1_amt, r2_amt, r3_amt)


smy <- x %>%
  group_by(f) %>%
  summarise(n = n(), # population in each f group
            num_r1 = sum(r1_amt >= 50)) # amount of r1 in each f group

I've tried .[r1_amt >= 50]$amt, cumsum(r1_amt >= 50), sum(ifelse(r1_amt >= 50, r1_amt, 0)) but haven't been able to come up with the grouped numbers.

So 1 given row could be a 60 for r1, 40 for r2, and 55 for r3 and it should be included in the summed amount column for only r1 and r3 if that makes sense.

Upvotes: 1

Views: 172

Answers (2)

M--
M--

Reputation: 28825

This may be possible in a bit cleaner way too, but this should work:

x.v2 <- x # temp variable
x.v2[which(x[,4] != 'r1'),3] <- 0 # replace values of tpe != 'r1' with 0's 

smy <- x.v2 %>%
            group_by(f) %>%
            summarise(n = n(), # population in each f group
            num_r1 = sum(amt)) # sum of values for type == 'r1' in each group f

rm(x.v2) # remove temp variable

smy # output for seed = 123 (use set.seed(123) for building data)


#   f  n   num_r1
# 1 a 20 114.1879
# 2 d 28 611.9858
# 3 f 19 351.5366
# 4 s 29 357.8402

Upvotes: 1

user295691
user295691

Reputation: 7248

It sounds like what you want to do is just group by both f and type to compute the per-f/type statistics.

x %>% group_by(f, type) %>% summarise(num_type=n(), sum_type=sum(amt))
Source: local data frame [16 x 4]
Groups: f [?]

       f  type num_type   sum_type
   <chr> <chr>    <int>      <dbl>
1      a    r1       12   616.6610
2      a    r2        6   417.5589
3      a    r3        9   375.2246
4      a    r4        7   346.5796
5      d    r1        8   471.1253
...

You can use tidyr to go back to wide form for the sum_type field, but I would only do so for display purposes:

> res %>% spread(type, sum_amt)
Source: local data frame [12 x 6]
Groups: f [4]

       f num_type       r1       r2       r3       r4
*  <chr>    <int>    <dbl>    <dbl>    <dbl>    <dbl>
1      a        6       NA 417.5589       NA       NA
2      a        7       NA       NA       NA 346.5796
3      a        9       NA       NA 375.2246       NA
...

Upvotes: 1

Related Questions