AP38
AP38

Reputation: 169

Sum columns based on condition

I have a dataset that looks like this:

scaf   pos   ref   A-1   A-2   A-3   A-4   B-1   B-2   B-3   B-4
MT1   11722   A    330    0     0     0    111    0    0     0
MT1   11723   T     0    230    0     8     0    18    0     2
MT1   11724   A    222    2     6     0    56     8    0     0

Here is what I want to do. For each row:

if ref = A, sum columns A-2 to A-4 and B-2 to B-4

if ref = T, sum columns A-1, A-3 and A4 together and B-1, B3 and B-4

if ref = C, sum columns A-1, A-2 and A4 together. Same with B columns

if ref = G, sum columns A-1 to A-3 and B1 to B3.

You would end up with:

scaf   pos   ref   A    B
MT1   11722   A    0    0
MT1   11723   T    8    2
MT1   11724   A    8    8

Any idea how I could do that? Keep in mind that I have about 100 columns and thousands of rows.

Thanks!

Upvotes: 1

Views: 69

Answers (1)

MKR
MKR

Reputation: 20095

One option using `dplyr::case_when' can be as:

df %>% mutate(A = case_when(
      ref == "A" ~ A.2+A.4,
      ref == "T" ~ A.1+A.3+A.4,
      ref == "C" ~ A.1+A.2+A.4,
      ref == "G" ~ A.1+A.3
                           )) %>% 
        mutate(B = case_when(
        ref == "A" ~ B.2+B.4,
        ref == "T" ~ B.1+B.3+B.4,
        ref == "C" ~ B.1+B.2+B.4,
        ref == "G" ~ B.1+B.3
                     )) %>%
      select(scaf, pos, ref, A, B)

#   scaf   pos ref A B
# 1  MT1 11722   A 0 0
# 2  MT1 11723   T 8 2
# 3  MT1 11724   A 2 8              

Data:

df <- read.table(text = 
"scaf   pos   ref   A-1   A-2   A-3   A-4   B-1   B-2   B-3   B-4
MT1   11722   A    330    0     0     0    111    0    0     0
MT1   11723   T     0    230    0     8     0    18    0     2
MT1   11724   A    222    2     6     0    56     8    0     0",
header = TRUE, stringsAsFactors = FALSE)

Upvotes: 2

Related Questions