Syed Ahmed
Syed Ahmed

Reputation: 209

Number of rows by Group ID conditional

I have a dataframe like this:

  ID S1 C
1 1  2  3  
2 1  2  3
3 3  1  1
4 6  2  5
5 6  7  5

What I need is the number of rows per group ID where S1 <= C. This is the desired output.

  ID  Obs 
1 1   2 
2 3   1 
3 6   1 

Even though the question was answered below, I have a follow up question: Is it possible to do the same for multiple columns (S1, S2, ..). For example for the dataframe below:

  ID S1  S2 C
1 1  2   2  3  
2 1  2   2  3
3 3  1   1  1
4 6  2   2  5
5 6  7   7  5

And then get:

  ID  S1.Obs S2.Obs 
1 1   2      2
2 3   1      1
3 6   1      1

Upvotes: 0

Views: 68

Answers (3)

akrun
akrun

Reputation: 886938

An option with data.table

library(data.table)
setDT(df)[S1 <=C, .(Obs = .N), ID]
#   ID Obs
#1:  1   2
#2:  3   1
#3:  6   1

data

df <-  structure(list(ID = c(1L, 1L, 3L, 6L, 6L), S1 = c(2L, 2L, 1L, 2L, 7L),
C = c(3L, 3L, 1L, 5L, 5L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5"))

Upvotes: 0

Darren Tsai
Darren Tsai

Reputation: 35554

A base R solution with aggregate().

aggregate(Obs ~ ID, transform(df, Obs = S1 <= C), sum)

#   ID Obs
# 1  1   2
# 2  3   1
# 3  6   1

A dplyr solution

library(dplyr)

df %>%
  filter(S1 <= C) %>%
  count(ID, name = "Obs")

#   ID Obs
# 1  1   2
# 2  3   1
# 3  6   1

Data

df <- structure(list(ID = c(1L, 1L, 3L, 6L, 6L), S1 = c(2L, 2L, 1L, 2L, 7L),
C = c(3L, 3L, 1L, 5L, 5L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5"))

Extension

If you want to apply this rule on multiple columns such as S1, S2, S3:

df %>%
  group_by(ID) %>%
  summarise(across(starts_with("S"), ~ sum(.x <= C)))

Upvotes: 1

Alexander Alexandrov
Alexander Alexandrov

Reputation: 1362

data <- data.frame(
  ID = c(1, 1, 3, 6, 6),
  S1 = c(2, 2, 1, 2, 7),
  C = c(3, 3, 1, 5, 5)
)

library(dplyr)

data.filtered <- data[data$S1 <= data$C,]

data.filtered %>% group_by(ID) %>%
  summarize(Obs = length(ID))

Upvotes: 1

Related Questions