metaltoaster
metaltoaster

Reputation: 378

Calculating percentage of increased and decreased values between factors

I'm looking for a way to calculate the change of scores between factors (for example, questionnaire scores between Pre and Post treatment). I want to figure out what percentage of participants improved and what percentage did not between Pre and Post.

I have looked at some dplyr solutions but I think I am missing a line of code from it but I am not sure.

    ID<-c("aaa","bbb","ccc","ddd","eee","fff", "ggg","aaa","bbb","ccc","ddd","eee","fff", "ggg")
    Score<-sample(40,14)
    Pre_Post<-c(1,1,1,1,1,1,1,2,2,2,2,2,2,2)
    df<-cbind(ID, Pre_Post, Score)
    df$Score<-as.numeric(df$Score)
    df<-as.data.frame(df)


    #what I have tried
    df2<-df%>%
    group_by(ID, Pre_post)
    mutate(Pct_change=mutate(Score/lead(Score)*100))

But I get error messages. As well, I wasn't confident that the code was right to begin with.

Expected outcome:- What I want to achieve is getting the percentages of ID's that have improved. So in the case of the mock example that I have provided, only 42.86% of ID's have improved from Pre to Post, while 57.14% actually worsened between Pre and Post.

Any suggestions would be welcome :)

Upvotes: 1

Views: 187

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389205

Another option with dplyr assuming you always have two values with Pre as 1 and Post as 2 would be to group_by ID and subtract the second value with first value and calculate the ratio for positive and negative values.

library(dplyr)

df %>%
  arrange(ID, Pre_Post) %>%
  group_by(ID) %>%
  summarise(val = Score[2] - Score[1]) %>%
  summarise(total_pos = sum(val > 0)/n(), 
            total_neg = sum(val < 0)/ n())

# A tibble: 1 x 2
#  total_pos total_neg
#      <dbl>     <dbl>
#1     0.429     0.571

data

ID <- c("aaa","bbb","ccc","ddd","eee","fff", "ggg","aaa","bbb",
      "ccc","ddd","eee","fff", "ggg")
Score <- sample(40,14)
Pre_Post <- c(1,1,1,1,1,1,1,2,2,2,2,2,2,2)
df <- data.frame(ID, Pre_Post, Score)

Upvotes: 1

Cettt
Cettt

Reputation: 11981

you have several typos that is why you get an error.

You can do something like this to get old and new scores side by side:

library(tidyverse)

df %>%
  spread(Pre_Post, Score) %>%
  rename(Score_pre = `1`, Score_post = `2`) 

   ID Score_pre Score_post
1 aaa        19         24
2 bbb        39         35
3 ccc         2         29
4 ddd        38         15
5 eee        36          9
6 fff        23         10
7 ggg        21         27

To get the number of improvements you have to convert Score to numeric first:

df %>% as_tibble() %>% 
  mutate(Score = as.numeric(Score)) %>% 
  spread(Pre_Post, Score) %>%
  rename(Score_pre = `1`, Score_post = `2`) %>%
  mutate(improve = if_else(Score_pre > Score_post, "0", "1")) %>% 
  group_by(improve) %>% 
  summarise(n = n()) %>% 
  mutate(percentage = n / sum(n))

# A tibble: 2 x 3
  improve     n percentage
  <chr>   <int>      <dbl>
1 0           3      0.429
2 1           4      0.571

Upvotes: 2

Related Questions