Reputation: 1148
I am struggling with writing code to calculate and then plot the growth rate. My data frame df
looks like this
ID Jan_Score Dec_Score Cluster
A 0 5 1
B 19 14 2
F 13 21 3
D 12 10 2
M 27 33 4
P 54 54 4
My question is, how can we calculate (and if possible plot) the growth per student ID and then per cluster?
Any help would be greatly appreciated.
I am using the following formula for calculating growth per person (i.e., per ID)
df$growth = (df$Dec_Score - df$Jan_Score) / df$Jan_Score
Any help would be greatly appreciated!
The following posts are related but do not address my problem:
How to calculate growth with a positive and negative number?,
How to calculate percentage when old value is ZERO,
what is my increment percentage from 0 to 20?,
Growth calculation NaN with 0 value
For reference, the dput(df) is
dput(df)
structure(list(ID = c("A", "B", "F", "D", "M", "P"), Jan_Score = c(0L,
19L, 13L, 12L, 27L, 54L), Dec_Score = c(5L, 14L, 21L, 10L, 33L,
54L), Cluster = structure(c(1L, 2L, 3L, 2L, 4L, 4L), .Label = c("1",
"2", "3", "4"), class = "factor")), row.names = c(NA, -6L), class = "data.frame")```
Upvotes: 0
Views: 86
Reputation: 66490
Perhaps:
df$growth = pmax(0, df$Dec_Score / pmax(0.1, df$Jan_Score) - 1))
Starting from the inside, this will replace any Jan_Score < 0.1 with 0.1, and then will calculate the growth rate. If that rate is less than 0, it will replace with 0. I'm not sure what arbitrary adjustments you want to make to assume a "good offset" -- you're in a better position to bring that sort of domain understanding.
As for looking at clusters, it depends what you're trying to see. One approach, if you want to capture reliable observations of growth, could be to filter out rows with erroneous data, and then average the remaining Jan & Dec scores per cluster. E.g.
library(dplyr)
df %>%
filter(pmin(Jan_Score, Dec_Score) > 0, Dec_Score >= Jan_Score) %>%
group_by(Cluster) %>%
summarize(across(Jan_Score:Dec_Score, mean)) %>%
mutate(growth = Dec_Score / Jan_Score - 1)
Upvotes: 1