Sandy
Sandy

Reputation: 1148

Growth rate in student abilities

I am struggling with writing code to calculate and then plot the growth rate. My data frame df looks like this

ID  Jan_Score  Dec_Score  Cluster
A   0          5          1
B   19         14         2
F   13         21         3
D   12         10         2
M   27         33         4
P   54         54         4

My question is, how can we calculate (and if possible plot) the growth per student ID and then per cluster?

Any help would be greatly appreciated.

Partial solution

I am using the following formula for calculating growth per person (i.e., per ID)

df$growth = (df$Dec_Score - df$Jan_Score) / df$Jan_Score

Any help would be greatly appreciated!

The following posts are related but do not address my problem:

How to calculate growth with a positive and negative number?,

How to calculate percentage when old value is ZERO,

what is my increment percentage from 0 to 20?,

Growth calculation NaN with 0 value

For reference, the dput(df) is

dput(df)
structure(list(ID = c("A", "B", "F", "D", "M", "P"), Jan_Score = c(0L, 
19L, 13L, 12L, 27L, 54L), Dec_Score = c(5L, 14L, 21L, 10L, 33L, 
54L), Cluster = structure(c(1L, 2L, 3L, 2L, 4L, 4L), .Label = c("1", 
"2", "3", "4"), class = "factor")), row.names = c(NA, -6L), class = "data.frame")```

Upvotes: 0

Views: 86

Answers (1)

Jon Spring
Jon Spring

Reputation: 66490

Perhaps:

df$growth = pmax(0, df$Dec_Score / pmax(0.1, df$Jan_Score) - 1))

Starting from the inside, this will replace any Jan_Score < 0.1 with 0.1, and then will calculate the growth rate. If that rate is less than 0, it will replace with 0. I'm not sure what arbitrary adjustments you want to make to assume a "good offset" -- you're in a better position to bring that sort of domain understanding.

As for looking at clusters, it depends what you're trying to see. One approach, if you want to capture reliable observations of growth, could be to filter out rows with erroneous data, and then average the remaining Jan & Dec scores per cluster. E.g.

library(dplyr)
df %>%
  filter(pmin(Jan_Score, Dec_Score) > 0, Dec_Score >= Jan_Score) %>%
  group_by(Cluster) %>%
  summarize(across(Jan_Score:Dec_Score, mean)) %>%
  mutate(growth = Dec_Score / Jan_Score - 1)

Upvotes: 1

Related Questions