Reputation: 378
I'm looking for a way to calculate the change of scores between factors (for example, questionnaire scores between Pre and Post treatment). I want to figure out what percentage of participants improved and what percentage did not between Pre and Post.
I have looked at some dplyr
solutions but I think I am missing a line of code from it but I am not sure.
ID<-c("aaa","bbb","ccc","ddd","eee","fff", "ggg","aaa","bbb","ccc","ddd","eee","fff", "ggg")
Score<-sample(40,14)
Pre_Post<-c(1,1,1,1,1,1,1,2,2,2,2,2,2,2)
df<-cbind(ID, Pre_Post, Score)
df$Score<-as.numeric(df$Score)
df<-as.data.frame(df)
#what I have tried
df2<-df%>%
group_by(ID, Pre_post)
mutate(Pct_change=mutate(Score/lead(Score)*100))
But I get error messages. As well, I wasn't confident that the code was right to begin with.
Expected outcome:- What I want to achieve is getting the percentages of ID's that have improved. So in the case of the mock example that I have provided, only 42.86% of ID's have improved from Pre to Post, while 57.14% actually worsened between Pre and Post.
Any suggestions would be welcome :)
Upvotes: 1
Views: 187
Reputation: 389205
Another option with dplyr
assuming you always have two values with Pre
as 1 and Post
as 2 would be to group_by
ID
and subtract the second value with first value and calculate the ratio for positive and negative values.
library(dplyr)
df %>%
arrange(ID, Pre_Post) %>%
group_by(ID) %>%
summarise(val = Score[2] - Score[1]) %>%
summarise(total_pos = sum(val > 0)/n(),
total_neg = sum(val < 0)/ n())
# A tibble: 1 x 2
# total_pos total_neg
# <dbl> <dbl>
#1 0.429 0.571
data
ID <- c("aaa","bbb","ccc","ddd","eee","fff", "ggg","aaa","bbb",
"ccc","ddd","eee","fff", "ggg")
Score <- sample(40,14)
Pre_Post <- c(1,1,1,1,1,1,1,2,2,2,2,2,2,2)
df <- data.frame(ID, Pre_Post, Score)
Upvotes: 1
Reputation: 11981
you have several typos that is why you get an error.
You can do something like this to get old and new scores side by side:
library(tidyverse)
df %>%
spread(Pre_Post, Score) %>%
rename(Score_pre = `1`, Score_post = `2`)
ID Score_pre Score_post
1 aaa 19 24
2 bbb 39 35
3 ccc 2 29
4 ddd 38 15
5 eee 36 9
6 fff 23 10
7 ggg 21 27
To get the number of improvements you have to convert Score
to numeric first:
df %>% as_tibble() %>%
mutate(Score = as.numeric(Score)) %>%
spread(Pre_Post, Score) %>%
rename(Score_pre = `1`, Score_post = `2`) %>%
mutate(improve = if_else(Score_pre > Score_post, "0", "1")) %>%
group_by(improve) %>%
summarise(n = n()) %>%
mutate(percentage = n / sum(n))
# A tibble: 2 x 3
improve n percentage
<chr> <int> <dbl>
1 0 3 0.429
2 1 4 0.571
Upvotes: 2