Reza
Reza

Reputation: 319

subtracting rows nested in a data.frame based on column value corrospondence

For each unique value of study, I was wondering how to subtract the yi of rows that have a group == "C" on each interval_id from their corresponding yi of rows that have a group != "C"?

For example, in study == 1, yi == .4 for group == "C" on interval_id == 0 should be subtracted from yi == .1 for group == "T1" on interval_id == 0.

Similarly, in study == 1, yi == .5 for group == "C" on interval_id == 1 should be subtracted from yi == .3 for group == "T1" on interval_id == 1.

The final output should be a data.frame with group == C rows deleted (below).

m = "
study group  yi  vi interval_id obs
1      T1    .1  1  0           1
1      T1    .3  2  1           2
1      C     .4  3  0           3
1      C     .5  4  1           4
2      T2    .6  5  0           5
2      C     .9  6  1           6
"

data <- read.table(text=m,h=T)

# DESIRED OUTPUT:
"
study group  yi  vi interval_id obs
1      T1    -.3  .  0           1
1      T1    -.2  .  1           2
2      T2    -.3  .  0           5
2      C      .9  .  1           6
"

Upvotes: 0

Views: 398

Answers (2)

akrun
akrun

Reputation: 887193

We could filter the data, do a join and do the subtraction

library(dplyr)
library(data.table)
data %>%
    filter(group == 'C') %>% 
    select(study, yi2= yi) %>%
    mutate(rn = rowid(study)) %>% 
    right_join(data %>% 
         filter(group != 'C') %>%
         mutate(rn = rowid(study))) %>%
    mutate(study, group, yi = yi- yi2, yi2 = NULL)

-output

 study rn group   yi vi interval_id obs
1     1  1    T1 -0.3  1           0   1
2     1  2    T1 -0.2  2           1   2
3     2  1    T2 -0.3  5           0   5

Or we could reshape to 'wide' format and then do the subtraction

library(tidyr)
data %>%
    mutate(new = c('NotC', 'C')[1 + (group == 'C')], 
      rn = rowid(study, new)) %>% 
   select(study, rn, new, yi) %>%
   pivot_wider(names_from = new, values_from = yi) %>% 
   transmute(yi = NotC - C) %>% 
   pull(yi) %>%
   mutate(data %>% 
      filter(group != 'C'), yi = .)

-output

 study group   yi vi interval_id obs
1     1    T1 -0.3  1           0   1
2     1    T1 -0.2  2           1   2
3     2    T2 -0.3  5           0   5

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388992

We can subtract yi values in every study where group != 'C' with yi values where group = 'C'. Finally, drop the rows where group != 'C'.

library(dplyr)

data %>%
  group_by(study) %>%
  mutate(yi = rep(yi[group != 'C'] - yi[group == 'C'], 2)) %>%
  ungroup() %>%
  filter(group != 'C')

#  study group    yi    vi interval_id   obs
#  <int> <chr> <dbl> <int>       <int> <int>
#1     1 T1     -0.3     1           0     1
#2     1 T1     -0.2     2           1     2
#3     2 T2     -0.3     5           0     5 

Upvotes: 2

Related Questions