TanZor
TanZor

Reputation: 237

Test specific row value in dplyr

I have the dataframe:

df <- data.frame(subject=c('x','x','x','y','y','y','z','z','z'),
                 trial=c(1,2,3,1,2,3,1,2,3),
                 condition=c('A','A','B','B','B','B','A','A','A'))

I would like to create a list of subjects for which the condition in trial number 1 is A and the condition in trial 3 is B. In the example above, this would be subject x only.

Ideally I would like to do this by grouping by subject, summarizing for each participant first_condition and third_condition, and then filtering according to the statement first_condition=='A' & third_condition=='B'. But I don't know how to extract the condition for a specific trial number when summarizing.

Any ideas? Thanks!

Upvotes: 1

Views: 43

Answers (3)

r2evans
r2evans

Reputation: 160782

Since the logic is specific pairs of trial and condition, it might be useful to inner-join your data with a table of permitted pairs, then finding subjects with all of the trials/conditions present.

library(dplyr)
fltr <- tibble(trial = c(1, 3), condition = c("A", "B"))
fltr
# # A tibble: 2 x 2
#   trial condition
#   <dbl> <chr>    
# 1     1 A        
# 2     3 B        

We'll do an "inner join", which means that we only retain rows that are present in both sides.

df %>%
  inner_join(fltr, by = c("trial", "condition"))
#   subject trial condition
# 1       x     1         A
# 2       x     3         B
# 3       y     3         B
# 4       z     1         A

From here, we need to filter those where a subject has both trials:

df %>%
  inner_join(fltr, by = c("trial", "condition")) %>%
  group_by(subject) %>%
  filter(all(c(1, 3) %in% trial)) %>%
  ungroup()
# # A tibble: 2 x 3
#   subject trial condition
#   <chr>   <dbl> <chr>    
# 1 x           1 A        
# 2 x           3 B        

Another method is to pivot wider, filter on specific trials, and optionally pivot back to long format (using tidyr).

The initial pivot-wider:

df %>%
  tidyr::pivot_wider(subject, names_from = "trial", names_prefix = "trial_", values_from = "condition")
# # A tibble: 3 x 4
#   subject trial_1 trial_2 trial_3
#   <chr>   <chr>   <chr>   <chr>  
# 1 x       A       A       B      
# 2 y       B       B       B      
# 3 z       A       A       A      

And then a very-readable filter:

df %>%
  tidyr::pivot_wider(subject, names_from = "trial", names_prefix = "trial_", values_from = "condition") %>%
  filter(trial_1 == "A" & trial_3 == "B")
# # A tibble: 1 x 4
#   subject trial_1 trial_2 trial_3
#   <chr>   <chr>   <chr>   <chr>  
# 1 x       A       A       B      

You can convert it back again with:

df %>%
  tidyr::pivot_wider(subject, names_from = "trial", names_prefix = "trial_", values_from = "condition") %>%
  filter(trial_1 == "A" & trial_3 == "B") %>%
  tidyr::pivot_longer(-subject, names_to = "trial", values_to = "condition")
# # A tibble: 3 x 3
#   subject trial   condition
#   <chr>   <chr>   <chr>    
# 1 x       trial_1 A        
# 2 x       trial_2 A        
# 3 x       trial_3 B        

This has the advantage of keeping all trials for that subject, regardless if it was one of the 1/A 3/B pairs we initially filtered on.

Upvotes: 0

Andrew Gustar
Andrew Gustar

Reputation: 18435

I think this is what you are describing...

df %>% group_by(subject) %>% 
  summarise(first_cond = condition[trial==1],
            third_cond = condition[trial==3]) %>% 
  filter(first_cond == "A",
         third_cond == "B")

# A tibble: 1 x 3
  subject first_cond third_cond
  <chr>   <chr>      <chr>     
1 x       A          B  

This will work provided there is only one condition for each value of trial for each subject.

Upvotes: 1

Karthik S
Karthik S

Reputation: 11596

See if this answers:

> df %>% group_by(subject) %>% filter(trial %in% c(1,3)) %>% ungroup() %>% group_by(subject) %>% filter(length(unique(condition)) == 2)
# A tibble: 2 x 3
# Groups:   subject [1]
  subject trial condition
  <chr>   <dbl> <chr>    
1 x           1 A        
2 x           3 B        
> 

Upvotes: 0

Related Questions