alex
alex

Reputation: 844

Find a the value in one group that corresponds to the maximum value in another, then subtract that value from another column

I have data like these:


A1    A2     A3       A4       B      C1        C2         C3       C4 

1     3      2        2        7       2        NA          6       9
4     6      12       1        3       1        6           5       2   
6     1      NA       1        7       3        2           2       1

I want to take the maximum value of the columns starting with "C" and then subtract the "A" column ending with the same number from "B". For example, the max of the "C"s in the first row would be 9 and so I would want to subtract A4 from B (7-2)

A1    A2     A3       A4       B      C1        C2         C3       C4      new 

1     11     2        2        7       2        NA          6       9        5
4     6      12       1        3       1        6           5       2       -3
6     1      NA       1        7       3        2           2       1        1


Is this possible? Maybe using "starts_with"?

Upvotes: 3

Views: 52

Answers (3)

akrun
akrun

Reputation: 887213

Here is one with max.col

library(dplyr)
library(tidyr)
df1 %>%
   mutate( new = B- as.data.frame(across(starts_with('A')))[
        cbind(row_number(), max.col(across(starts_with('C'),
       replace_na, -5), 'first'))])

-output

   A1 A2 A3 A4 B C1 C2 C3 C4 new
1  1  3  2  2 7  2 NA  6  9   5
2  4  6 12  1 3  1  6  5  2  -3
3  6  1 NA  1 7  3  2  2  1   1

data

df1 <- structure(list(A1 = c(1L, 4L, 6L), A2 = c(3L, 6L, 1L), A3 = c(2L, 
12L, NA), A4 = c(2L, 1L, 1L), B = c(7L, 3L, 7L), C1 = c(2L, 1L, 
3L), C2 = c(NA, 6L, 2L), C3 = c(6L, 5L, 2L), C4 = c(9L, 2L, 1L
)), class = "data.frame", row.names = c(NA, -3L))

Upvotes: 1

Gregor Thomas
Gregor Thomas

Reputation: 145785

Here is a base solution:

cs = grep("C", names(df))
as = grep("A", names(df))

c_max = apply(df[cs], 1, which.max)
df$new = df$B - as.matrix(df[as])[cbind(1:nrow(df), c_max)]
df
#   A1 A2 A3 A4 B C1 C2 C3 C4 new
# 1  1  3  2  4 7  2 NA  6  9   3
# 2  4  6 12  1 3  1  6  5  2  -3
# 3  6  1 NA  1 7  3  2  2  1   1

Using this data (note that in your input your A values in input are different from the A values in the output... I used the input.)

df = read.table(text = 'A1    A2     A3       A4       B      C1        C2         C3       C4 
1     3      2        4        7       2        NA          6       9
4     6      12       1        3       1        6           5       2   
6     1      NA       1        7       3        2           2       1', header = T)

Upvotes: 2

Anoushiravan R
Anoushiravan R

Reputation: 21918

Maybe this could help:

library(dplyr)

df %>%
  rowwise() %>%
  mutate(idx = which.max(c_across(starts_with('C'))), 
         new = B - get(paste0('A', idx))) %>%
  select(-idx)


# A tibble: 3 × 10
# Rowwise: 
     A1    A2    A3    A4     B    C1    C2    C3    C4   new
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1     1     3     2     4     7     2    NA     6     9     3
2     4     6    12     1     3     1     6     5     2    -3
3     6     1    NA     1     7     3     2     2     1     1

Upvotes: 1

Related Questions