Reputation: 23

Search by conditional grouping and logic?

I'm new and learning R. I'm trying to ask a question that I don't know the words for. Suppose I have a data frame such that:

df<-data.frame(ID=c("A","A","A","B","B","B","C","C","C"),
Week=c(1,2,3,1,2,3,1,2,3),
Variable=c(30,25,27,42,44,45,30,50,19))



  ID Week Variable
1  A    1       30
2  A    2       25
3  A    3       27
4  B    1       42
5  B    2       44
6  B    3       45
7  C    1       30
8  C    2       50
9  C    3       19

How can I find what is the average Variable at Week 2 for all ID that had Variable = 30 at Week 1?

For example, I would like the output in this example to = 37.5

Upvotes: 0

Answers (3)

Sal-laS

Reputation: 11649

Step 1: Obtain the IDs which had Variable=30 in Week1

res<-  subset( df,Variable==30 & Week==1, ID )

The output is:

> res
  ID
1  A
7  C

Step 2: Get all their variables at week 2:

dt<-subset(df,ID %in% as.vector(unlist(res))  & Week==2  ,select=c(ID,Variable))

The output is:

  ID Variable
2  A       25
8  C       50

Step 3: Get the mean:

  mean(dt$Variable)

The final output is:

37.5

In step 2 we have ID %in% as.vector(unlist(res)). So, what does it mean?

The %in% part simply is an operator which returns true if it finds an ID inside the right handside vector. For example, run the below sample:

  a<- 1:10

  b<-c(0,4,6,8,16)

  b %in% a

and the result is:

FALSE  TRUE  TRUE  TRUE FALSE

So, the %in% operator returns a Boolean value for each element of b. The result will be True if, that element exist in a, otherwise it returns False. As you see 0 and 16 have False.

But, the point is, a should be vector, meanwhile res is a data.frame so, i need to first unlist it, and then consider it as a vector (as.vector).

In conclusion, ID %in% as.vector(unlist(res)) checks if each ID exist in res or not.

Upvotes: 0

Nate

Reputation: 10671

This might be easier to read/see.

library(tidyverse)

df %>% 
  spread(Week, Variable) %>%
  filter(`1` == 30) %>%
  with(mean(`2`))

[1] 37.5

I think tidyverse code is easier to understand because you can read it left to right like you would any non-code text. And the piping %>% makes seeing the order of operations easier, no more parentheses to parse.

Upvotes: 1

Saurabh Chauhan

Reputation: 3221

First we need ID's which have entry for variable=30 AND week=1 and then from that ID's extract ID's with Week=2 and do avg(Variable)

Base R Solution:

mean(df[df$ID %in% (df[df$Week==1 & df$Variable==30,1]) & df$Week==2,3])

Output:

[1] 37.5

OR (another approach)

Using sqldf:

library(sqldf)
sqldf("select avg(Variable) from df where ID IN (select ID from df where variable=30 AND week=1) AND Week=2")

Output:

    avg(Variable)
1          37.5

Upvotes: 0

Search by conditional grouping and logic?

Answers (3)

Related Questions