MDC
MDC

Reputation: 27

Substract one row from another with conditions

I have a table like this:

  treatment individual phase dist_mean  track
1   control          1   pre     13.33 569.99
2   control          1  post     10.95 624.65
3   control          2   pre      9.93 363.35
4   control          2  post     10.11 339.88
5   control          3   pre     12.00 676.42
6   control          3  post     12.80 939.15

In principle, two rows are always paired. I need to subtract dist_mean of the post-phase from the pre-phase of a sample. The easy way would be to subtract row 2 from 1 and so on. But given the possibility that this order is disturbed at any point, the whole calculation would go wrong. That's why I would like to have the calculation under the conditions, that the treatment and the individual of both phases match. Info: the treatment changes. It's not always control.

Upvotes: 2

Views: 57

Answers (4)

ThomasIsCoding
ThomasIsCoding

Reputation: 101343

A data.table option

setDT(df)[
  order(treatment, individual, phase)
][
  ,
  setNames(lapply(.SD, diff), paste0("diff_", names(.SD))),
  by = .(treatment, individual),
  .SDcols = c("dist_mean", "track")
]

gives

   treatment individual diff_dist_mean diff_track
1:   control          1           2.38     -54.66
2:   control          2          -0.18      23.47
3:   control          3          -0.80    -262.73

A base R option using reshape

transform(
  reshape(
    df,
    direction = "wide",
    idvar = c("treatment", "individual"),
    timevar = "phase"
  ),
  diff_dist_mean = dist_mean.pre - dist_mean.post,
  diff_track = track.pre - track.post
)

gives

  treatment individual dist_mean.pre track.pre dist_mean.post track.post
1   control          1         13.33    569.99          10.95     624.65
3   control          2          9.93    363.35          10.11     339.88
5   control          3         12.00    676.42          12.80     939.15
  diff_dist_mean diff_track
1           2.38     -54.66
3          -0.18      23.47
5          -0.80    -262.73

Upvotes: 2

zx8754
zx8754

Reputation: 56159

Using data.table, reshape long-to-wide, then get difference in post/pre columns:

library(data.table)

setDT(df1)

dcast(df1, treatment + individual ~ phase, value.var = c("dist_mean", "track")
      )[, .(treatment, individual,
            diff_dist_mean = dist_mean_post - dist_mean_pre,
            diff_track = track_post - track_pre)]
#    treatment individual diff_dist_mean diff_track
# 1:   control          1          -2.38      54.66
# 2:   control          2           0.18     -23.47
# 3:   control          3           0.80     262.73

Upvotes: 0

Yuriy Saraykin
Yuriy Saraykin

Reputation: 8880

df <- read.table(text = "  treatment individual phase dist_mean  track
1   control          1   pre     13.33 569.99
2   control          1  post     10.95 624.65
3   control          2   pre      9.93 363.35
4   control          2  post     10.11 339.88
5   control          3   pre     12.00 676.42
6   control          3  post     12.80 939.15", header = T)

library(tidyverse)
df %>% 
  pivot_wider(c(treatment, individual), names_from = phase, values_from = dist_mean) %>% 
  mutate(d = post - pre)
#> # A tibble: 3 x 5
#>   treatment individual   pre  post      d
#>   <chr>          <int> <dbl> <dbl>  <dbl>
#> 1 control            1 13.3   11.0 -2.38 
#> 2 control            2  9.93  10.1  0.180
#> 3 control            3 12     12.8  0.8

Created on 2021-03-09 by the reprex package (v1.0.0)

data.table

df <- read.table(text = "  treatment individual phase dist_mean  track
1   control          1   pre     13.33 569.99
2   control          1  post     10.95 624.65
3   control          2   pre      9.93 363.35
4   control          2  post     10.11 339.88
5   control          3   pre     12.00 676.42
6   control          3  post     12.80 939.15", header = T)
library(data.table)
setDT(df)
res <- dcast(data = df, formula = treatment + individual ~ phase, value.var = "dist_mean")[, d := post - pre]
head(res)
#>    treatment individual  post   pre     d
#> 1:   control          1 10.95 13.33 -2.38
#> 2:   control          2 10.11  9.93  0.18
#> 3:   control          3 12.80 12.00  0.80

Created on 2021-03-09 by the reprex package (v1.0.0)

Upvotes: 0

Rui Barradas
Rui Barradas

Reputation: 76402

Use aggregate:

aggregate(dist_mean ~ treatment + individual, df1, function(x) diff(rev(x)))
#  treatment individual dist_mean
#1   control          1      2.38
#2   control          2     -0.18
#3   control          3     -0.80

Data

df1 <- read.table(text = "
  treatment individual phase dist_mean  track
1   control          1   pre     13.33 569.99
2   control          1  post     10.95 624.65
3   control          2   pre      9.93 363.35
4   control          2  post     10.11 339.88
5   control          3   pre     12.00 676.42
6   control          3  post     12.80 939.15
", header = TRUE)

Upvotes: 1

Related Questions