Find the difference within a group based on a condition

Question

I have a data frame with entries such as;

V1 zipcode year V2 1 11747 2012 5 2 11747 2012 10 3 11747 2012 20 4 11747 2012 15 1 11000 2012 20 2 11000 2012 15 3 11000 2012 20

I want to find the difference in V2 for the maximum value of V1 for each zipcode year combination. In this case 15-5 and 20-20.

I would like the result to look like:

V1 zipcode year V2 V3 1 11747 2012 5 10 2 11747 2012 10 10 3 11747 2012 20 10 4 11747 2012 15 10 1 11000 2012 20 0 2 11000 2012 15 0 3 11000 2012 20 0

So for I have attempting to use group by and mutate with the following conditions: df %>% group_by(year, zipcode) %>% mutate(difV2 = df[df$V1== max(df$V1),4] - df[df$V1== min(df$V1),4]) Any tips would be greatly appreciated!

akrun · Accepted Answer

After the group_by, we need to only use the column names instead of df$V1. When we do df$V1, it is extracting the whole column and not the values that are specific to that group.

library(dplyr)
df %>% 
  group_by(year, zipcode) %>% 
  mutate(V3 = V2[V1== max(V1)] - V2[V1== min(V1)])
# A tibble: 7 x 5
# Groups:   year, zipcode [2]
#     V1 zipcode  year    V2    V3
#        
#1     1   11747  2012     5    10
#2     2   11747  2012    10    10
#3     3   11747  2012    20    10
#4     4   11747  2012    15    10
#5     1   11000  2012    20     0
#6     2   11000  2012    15     0
#7     3   11000  2012    20     0

Find the difference within a group based on a condition

Answers (1)

Related Questions