Reputation: 37
I have a data frame with entries such as;
V1 zipcode year V2
1 11747 2012 5
2 11747 2012 10
3 11747 2012 20
4 11747 2012 15
1 11000 2012 20
2 11000 2012 15
3 11000 2012 20
I want to find the difference in V2 for the maximum value of V1 for each zipcode year combination. In this case 15-5 and 20-20.
I would like the result to look like:
V1 zipcode year V2 V3
1 11747 2012 5 10
2 11747 2012 10 10
3 11747 2012 20 10
4 11747 2012 15 10
1 11000 2012 20 0
2 11000 2012 15 0
3 11000 2012 20 0
So for I have attempting to use group by and mutate with the following conditions:
df %>% group_by(year, zipcode) %>% mutate(difV2 = df[df$V1== max(df$V1),4] - df[df$V1== min(df$V1),4])
Any tips would be greatly appreciated!
Upvotes: 2
Views: 47
Reputation: 887118
After the group_by
, we need to only use the column names instead of df$V1
. When we do df$V1
, it is extracting the whole column and not the values that are specific to that group.
library(dplyr)
df %>%
group_by(year, zipcode) %>%
mutate(V3 = V2[V1== max(V1)] - V2[V1== min(V1)])
# A tibble: 7 x 5
# Groups: year, zipcode [2]
# V1 zipcode year V2 V3
# <int> <int> <int> <int> <int>
#1 1 11747 2012 5 10
#2 2 11747 2012 10 10
#3 3 11747 2012 20 10
#4 4 11747 2012 15 10
#5 1 11000 2012 20 0
#6 2 11000 2012 15 0
#7 3 11000 2012 20 0
Upvotes: 1