Creating a new dataframe based on index of another dataframe in R

Question

Hypothetical data:

hypo <- data.frame('X1' = c('a','b','a','b','a','b','a','b'),
       'X2' = c('x','x','y','y','x','x','y','y'),
       'X3' = c('m','m','m','m','n','n','n','n'),
       'X4' = c(1,6,4,9,10,7,8,3))

Output:

  X1 X2 X3 X4
1  a  x  m  1
2  b  x  m  6
3  a  y  m  4
4  b  y  m  9
5  a  x  n 10
6  b  x  n  7
7  a  y  n  8
8  b  y  n  3

You want to find the difference between X4 values when the X1 and X2 values are the same and X3 is different. For example, we can do this for a single value using subset():

value <- (subset(hypo, X1 == 'a' & X2 == 'x' & X3 == 'm')$X4 
- subset(hypo, X1 == 'a' & X2 == 'x' & X3 == 'n')$X4)
# -9

How can we do this such that for difference between X4 values are calculated for all instances where X1 and X2 are the same and X3 different?

Ideal output:

  X1 X2  m-n 
1  a  x  -9
2  b  x  -1  
3  a  y  -4  
4  b  y   6

Any help would be greatly appreciated.

Kota Mori · Accepted Answer

This one is explicit that it should compute m-n rather than n-m.

 library(dplyr)
 hypo %>% group_by(X1, X2) %>% 
   summarize(`m-n` = X4[X3=="m"] - X4[X3=="n"])

Creating a new dataframe based on index of another dataframe in R

Answers (2)

Related Questions