Reputation: 169
I have a dataset that looks like this:
scaf pos ref A-1 A-2 A-3 A-4 B-1 B-2 B-3 B-4
MT1 11722 A 330 0 0 0 111 0 0 0
MT1 11723 T 0 230 0 8 0 18 0 2
MT1 11724 A 222 2 6 0 56 8 0 0
Here is what I want to do. For each row:
if ref = A, sum columns A-2 to A-4 and B-2 to B-4
if ref = T, sum columns A-1, A-3 and A4 together and B-1, B3 and B-4
if ref = C, sum columns A-1, A-2 and A4 together. Same with B columns
if ref = G, sum columns A-1 to A-3 and B1 to B3.
You would end up with:
scaf pos ref A B
MT1 11722 A 0 0
MT1 11723 T 8 2
MT1 11724 A 8 8
Any idea how I could do that? Keep in mind that I have about 100 columns and thousands of rows.
Thanks!
Upvotes: 1
Views: 69
Reputation: 20095
One option using `dplyr::case_when' can be as:
df %>% mutate(A = case_when(
ref == "A" ~ A.2+A.4,
ref == "T" ~ A.1+A.3+A.4,
ref == "C" ~ A.1+A.2+A.4,
ref == "G" ~ A.1+A.3
)) %>%
mutate(B = case_when(
ref == "A" ~ B.2+B.4,
ref == "T" ~ B.1+B.3+B.4,
ref == "C" ~ B.1+B.2+B.4,
ref == "G" ~ B.1+B.3
)) %>%
select(scaf, pos, ref, A, B)
# scaf pos ref A B
# 1 MT1 11722 A 0 0
# 2 MT1 11723 T 8 2
# 3 MT1 11724 A 2 8
Data:
df <- read.table(text =
"scaf pos ref A-1 A-2 A-3 A-4 B-1 B-2 B-3 B-4
MT1 11722 A 330 0 0 0 111 0 0 0
MT1 11723 T 0 230 0 8 0 18 0 2
MT1 11724 A 222 2 6 0 56 8 0 0",
header = TRUE, stringsAsFactors = FALSE)
Upvotes: 2