Reputation: 133
I have survey data from patients and relatives of the patient. That data looks like this.
X1 X2 X3 X4 X5
a 11 2 3 4
b 11 2 4 5
a 12 4 4 5
b 12 5 2 3
a 13 1 1 0
b 13 3 2 4
I want to subtract by different x1 and same x2 condition So the result should look like
X2 X3 X4 X5
11 0 -1 -1
12 -1 2 2
13 -2 -1 -4
I can solve that question using for cycle but I prefer a dplyr solution for readable code. Also, I get to experience dplyr style code
Thanks to all viewers.
Upvotes: 1
Views: 1402
Reputation: 2419
data.table
version
library(data.table)
dt <- fread("
X1 X2 X3 X4 X5
a 11 2 3 4
b 11 2 4 5
a 12 4 4 5
b 12 5 2 3
a 13 1 1 0
b 13 3 2 4")
dt[,.SD[X1 == "a"] - .SD[X1 == "b"],.SDcols = 3:5,by = X2]
# or
dt[, lapply(.SD[.N:1], diff), .SDcols = 3:5, by = X2]
Result
X2 X3 X4 X5
1: 11 0 -1 -1
2: 12 -1 2 2
3: 13 -2 -1 -4
Upvotes: 1
Reputation: 118
Try this:
library(dplyr)
df <- data.frame( X1 = c("a","b","a","b","a","b"),
X2 = c(11,11,12,12,13,13),
X3 = c(2,2,4,5,1,3),
X4 = c(3,4,4,2,1,2),
X5 = c(4,5,5,3,0,4))
df$XMult <-with(df, case_when(X1=="a"~1, X1=="b"~-1))
df %>%
group_by( X2) %>%
summarize(across(.cols = c("X3", "X4", "X5"), ~sum(.x*XMult)))
Or, alternately, without modifying df
:
df %>%
group_by( X2) %>%
summarize(across(.cols = c("X3", "X4", "X5"),
~sum(.x*case_when(X1=="a"~1, X1=="b"~-1))))
Output:
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 4
X2 X3 X4 X5
<dbl> <dbl> <dbl> <dbl>
1 11 0 -1 -1
2 12 -1 2 2
3 13 -2 -1 -4
One more slightly-clumsy alternative that uses lag
instead of the diff
in Ben's answer:
df %>%
group_by( X2) %>%
mutate(across(.cols = X3:X5, ~(lag(.x) - .x))) %>%
filter( X1 == "b") %>%
select( -"X1")
Upvotes: 2
Reputation: 1147
This is @Ben's answer but modifying it slightly to get the result you specified:
Data:
library(dplyr)
df <- tibble::tribble(~ X1..X2..X3..X4..X5,
"a 11 2 3 4",
"b 11 2 4 5",
"a 12 4 4 5",
"b 12 5 2 3",
"a 13 1 1 0",
"b 13 3 2 4") %>%
separate(col = "X1..X2..X3..X4..X5",
into = c("x1", "x2", "x3", "x4", "x5")) %>%
mutate(across(2:5, as.numeric))
Wrangling with dplyr:
df %>%
group_by(x2) %>%
summarise(-across(x3:x5, diff))
Output:
# A tibble: 3 x 4
x2 x3 x4 x5
* <dbl> <dbl> <dbl> <dbl>
1 11 0 -1 -1
2 12 -1 2 2
3 13 -2 -1 -4
Upvotes: 3