R_is_Broken
R_is_Broken

Reputation: 49

If rows match on two conditions, subtract in particular order, Tidyverse

I am trying to write Tidyverse code that finds two rows that have exactly matching values on two conditions. The rows should match on Participant_ID & Indicator. There should be no more than two rows that match exactly on these two values. In this pair of matches, one will have occurred at timepoint 1 and the other at timepoint 4. After the matches are identified, I want the score at timepoint 4 to be subtracted from the score at timepoint 1. I would also like to preserve the Group number in the final tibble.

There will be rows that don't have matches. Those can be omitted, if possible. I don't want them in the resulting tibble.

I am having trouble wrapping my head around this, so thank you very much for your help!

example <- tibble (
  Participant_ID = c('Part1','Part2','Part1','Part2','Part1','Part2','Part1','Part2'),
  Indicator =c('item1','item1','item1','item1','item2','item2','item2','item2'),
  Timepoint = c(1,1,4,4,1,1,4,4),
  Score = c(3,3,1.5,3,4,4,3.5,3.5),
Group = c(1,2,1,2,1,2,1,2))

Upvotes: 0

Views: 96

Answers (3)

r2evans
r2evans

Reputation: 160447

example %>%
  pivot_wider(c(Participant_ID, Indicator, Group), names_from = Timepoint, values_from = Score) %>%
  transmute(Participant_ID, Indicator, Group, Score = `1` - `4`)
# A tibble: 4 x 4
#   Participant_ID Indicator Group Score
#   <chr>          <chr>     <dbl> <dbl>
# 1 Part1          item1         1   1.5
# 2 Part2          item1         2   0  
# 3 Part1          item2         1   0.5
# 4 Part2          item2         2   0.5

Data

example <- structure(list(Participant_ID = c("Part1", "Part2", "Part1", "Part2", "Part1", "Part2", "Part1", "Part2"), Indicator = c("item1", "item1", "item1", "item1", "item2", "item2", "item2", "item2"), Timepoint = c(1, 1, 4, 4, 1, 1, 4, 4), Score = c(3, 3, 1.5, 3, 4, 4, 3.5, 3.5), Group = c(1, 2, 1, 2, 1, 2, 1, 2)), row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388992

You could arrange the data by descending Timepoint and then use diff by group.

library(dplyr)

example %>%
  arrange(Participant_ID, Indicator, desc(Timepoint)) %>%
  group_by(Participant_ID, Indicator) %>%
  summarise(Score = diff(Score))

# Participant_ID Indicator Score
#  <chr>          <chr>     <dbl>
#1 Part1          item1       1.5
#2 Part1          item2       0.5
#3 Part2          item1       0  
#4 Part2          item2       0.5

Upvotes: 1

semaphorism
semaphorism

Reputation: 866

One tidyverse approach is to use use pivot_wider() in tidyr to place the matches matches into one row, then calculate the difference between the two scores:

example %>% 
    pivot_wider(id_cols = c(Participant_ID, Indicator), values_from = Score, names_from = Timepoint, names_prefix = "Score_Timepoint_") %>%
    mutate(Score_difference = Score_Timepoint_1 - Score_Timepoint_4)

This produces:

# A tibble: 4 x 5
  Participant_ID Indicator Score_Timepoint_1 Score_Timepoint_4 Score_difference
  <chr>          <chr>                 <dbl>             <dbl>            <dbl>
1 Part1          item1                     3               1.5              1.5
2 Part2          item1                     3               3                0  
3 Part1          item2                     4               3.5              0.5
4 Part2          item2                     4               3.5              0.5

Upvotes: 1

Related Questions