Sriharshitha j
Sriharshitha j

Reputation: 43

Count of values with a condition and group

I have a data frame, sorted by groupID and date:

d1 <- data.frame(groupID = c(1,1,1,1,1,3,3,3,3), 
                 date = c(1,2,3,4,5,6,7,8,9),
                 value = c(1,1,25,1,1,25,1,25,1))

> d1
 groupID date value
       1    1     1
       1    2     1
       1    3    25
       1    4     1
       1    5     1
       3    6    25
       3    7     1
       3    8    25
       3    9     1  

I want to create two new columns:

  1. For every occurrence of 25, count of value=1 before for each group
  2. For every occurrence of 25, count of value=1 after value=25 before the next value=25 for each group

Desired output:

 groupID date value Prev1s After1s
       1    1     1
       1    2     1
       1    3    25      2       2
       1    4     1
       1    5     1
       3    6    25      0       1
       3    7     1
       3    8    25      1       1
       3    9     1

I'm able to do the same using Excel by creating a counter and taking the previous value. I have tried achieving the same in R using sum, shift() but in vain.

Upvotes: 2

Views: 142

Answers (2)

Jaap
Jaap

Reputation: 83275

An alternative using the data.table-package in combination with the rle-function:

library(data.table)
setDT(d1)[, c('prev1s','after1s') := {p <- a <- rle(value);
                                      i <- p$values == 25;
                                      p$values[i] <- shift(p$lengths, fill = 0)[i];
                                      a$values[i] <- shift(a$lengths, type = 'lead', fill = 0)[i];
                                      p$values[!i] <- a$values[!i] <- NA;
                                      list(inverse.rle(p),inverse.rle(a))},
          by = groupID][]

which gives:

   groupID date value prev1s after1s
1:       1    1     1     NA      NA
2:       1    2     1     NA      NA
3:       1    3    25      2       2
4:       1    4     1     NA      NA
5:       1    5     1     NA      NA
6:       3    6    25      0       1
7:       3    7     1     NA      NA
8:       3    8    25      1       1
9:       3    9     1     NA      NA

Upvotes: 0

Andrew Gustar
Andrew Gustar

Reputation: 18435

You can do this with dplyr...

library(dplyr)
#first set up some grouping variables based on runs before and after 25s
d1 <- d1 %>% mutate(PrevGp=cumsum(lag(value==25,default = 1)),
                    AfterGp=cumsum(value==25)) %>% 
#use these to calculate the values you want for each group
  group_by(groupID,PrevGp) %>% mutate(Prev1s=sum(value)-25) %>% 
  group_by(groupID,AfterGp) %>% mutate(After1s=sum(value)-25) %>% 
  ungroup() %>% 
#remove values (set to "") other than for value==25
  mutate(Prev1s=replace(Prev1s,value!=25,""),
         After1s=replace(After1s,value!=25,"")) %>% 
#and remove the grouping variables
  select(-c(PrevGp,AfterGp))

d1
# A tibble: 9 x 5
  groupID  date value Prev1s After1s
    <dbl> <dbl> <dbl>  <chr>   <chr>
1       1     1     1               
2       1     2     1               
3       1     3    25      2       2
4       1     4     1               
5       1     5     1               
6       3     6    25      0       1
7       3     7     1               
8       3     8    25      1       1
9       3     9     1               

Upvotes: 1

Related Questions