Reputation: 49
I am trying to write Tidyverse code that finds two rows that have exactly matching values on two conditions. The rows should match on Participant_ID & Indicator. There should be no more than two rows that match exactly on these two values. In this pair of matches, one will have occurred at timepoint 1 and the other at timepoint 4. After the matches are identified, I want the score at timepoint 4 to be subtracted from the score at timepoint 1. I would also like to preserve the Group number in the final tibble.
There will be rows that don't have matches. Those can be omitted, if possible. I don't want them in the resulting tibble.
I am having trouble wrapping my head around this, so thank you very much for your help!
example <- tibble (
Participant_ID = c('Part1','Part2','Part1','Part2','Part1','Part2','Part1','Part2'),
Indicator =c('item1','item1','item1','item1','item2','item2','item2','item2'),
Timepoint = c(1,1,4,4,1,1,4,4),
Score = c(3,3,1.5,3,4,4,3.5,3.5),
Group = c(1,2,1,2,1,2,1,2))
Upvotes: 0
Views: 96
Reputation: 160447
example %>%
pivot_wider(c(Participant_ID, Indicator, Group), names_from = Timepoint, values_from = Score) %>%
transmute(Participant_ID, Indicator, Group, Score = `1` - `4`)
# A tibble: 4 x 4
# Participant_ID Indicator Group Score
# <chr> <chr> <dbl> <dbl>
# 1 Part1 item1 1 1.5
# 2 Part2 item1 2 0
# 3 Part1 item2 1 0.5
# 4 Part2 item2 2 0.5
Data
example <- structure(list(Participant_ID = c("Part1", "Part2", "Part1", "Part2", "Part1", "Part2", "Part1", "Part2"), Indicator = c("item1", "item1", "item1", "item1", "item2", "item2", "item2", "item2"), Timepoint = c(1, 1, 4, 4, 1, 1, 4, 4), Score = c(3, 3, 1.5, 3, 4, 4, 3.5, 3.5), Group = c(1, 2, 1, 2, 1, 2, 1, 2)), row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 1
Reputation: 388992
You could arrange
the data by descending Timepoint
and then use diff
by group.
library(dplyr)
example %>%
arrange(Participant_ID, Indicator, desc(Timepoint)) %>%
group_by(Participant_ID, Indicator) %>%
summarise(Score = diff(Score))
# Participant_ID Indicator Score
# <chr> <chr> <dbl>
#1 Part1 item1 1.5
#2 Part1 item2 0.5
#3 Part2 item1 0
#4 Part2 item2 0.5
Upvotes: 1
Reputation: 866
One tidyverse
approach is to use use pivot_wider()
in tidyr
to place the matches matches into one row, then calculate the difference between the two scores:
example %>%
pivot_wider(id_cols = c(Participant_ID, Indicator), values_from = Score, names_from = Timepoint, names_prefix = "Score_Timepoint_") %>%
mutate(Score_difference = Score_Timepoint_1 - Score_Timepoint_4)
This produces:
# A tibble: 4 x 5
Participant_ID Indicator Score_Timepoint_1 Score_Timepoint_4 Score_difference
<chr> <chr> <dbl> <dbl> <dbl>
1 Part1 item1 3 1.5 1.5
2 Part2 item1 3 3 0
3 Part1 item2 4 3.5 0.5
4 Part2 item2 4 3.5 0.5
Upvotes: 1