Reputation:
I want to calculate the score
difference after grouping by Year, State, Tier, Group
. A stylised representation of my data would look like:
dat2 <- data.frame(
Year = sample(1990:1996, 10, replace = TRUE),
State = sample(c("AL", "CA", "NY"), 10, replace = TRUE),
Tier = sample(1:2),
Group = sample(c("A", "B"), 10, replace = TRUE),
Score = rnorm(10))
I tried mutate
with group_by_
and .dots
however it obtains values from the next absolute value (i.e. grouping does not seem to work). I am mostly interested in plotting the yearly differences (ala time-series even though some years would be NA
) so this can be solved by either lagging or calculating the next year's score.
Edit: So, if the dataset looks like:
Year State Tier Group Score
1990 AL 1 A 75
1990 AL 2 A 100
1990 AL 1 B 5
1990 AL 2 B 10
1991 AL 1 A 95
1991 AL 2 A 80
1991 AL 1 B 5
1991 AL 2 B 15
The desired end result would be:
Year State Tier Group Score Diff
1991 AL 1 A 95 20
1991 AL 1 B 5 0
1991 AL 2 A 80 -20
1991 AL 2 B 15 5
Upvotes: 2
Views: 4468
Reputation: 1327
If I understand correctly, you are trying to calculate the difference in Score
within each combination of Year, State, Tier, Group
? Presumably, your data will be sorted chronologically for the difference to make any sense. Your example is small for these combinations to be repeated but I believe the solution you are looking for would be:
library(dplyr)
dat2 %>%
arrange(Year) %>%
group_by(State, Tier, Group) %>%
mutate(ScoreDiff = Score - lag(Score))
With your current code, the ScoreDiff
column has a lot of NAs
because there usually won't be multiple cases of the same combination of your four variables in just 10 cases. But you can try it with a more general code (I've also changed the starting year to 1890 from 1990):
n <- 100
dat2 <- data.frame(
Year = sample(1890:1996, n, replace = TRUE),
State = sample(c("AL", "CA", "NY"), n, replace = TRUE),
Tier = sample(1:2),
Group = sample(c("A", "B"), n, replace = TRUE),
Score = rnorm(n))
dat2 %>%
arrange(Year) %>%
group_by(State, Tier, Group) %>%
mutate(ScoreDiff = Score - lag(Score))
Upvotes: 4