How to expand data matrix for corresponding column names

Question

I have this data matrix called mymat. It has got .GT columns for samples 00860 and 00861 . I want to expand this matrix with new .AD column. The corresponding .AD columns for each sample will have values 50,0 if .GT is 0/0, 25/25 if .GT is 0/1 and 0,50 if .GT is 1/1. I also want to add another column called .DP next to each column which will have 50 across the column and get the result. How can I do this kind of conditional expansion of matrix in R?

mymat <- structure(c("0/1", "1/1", "0/0", "0/0"), .Dim = c(2L, 2L), .Dimnames = list(
c("chr1:1163804", "chr1:1888193"
), c("00860.GT", "00861.GT")))

result:

           00860.GT 00860.AD 00860.DP 00861.GT 00861.AD 00861.DP
chr1:1163804 0/1      25/25       50      0/0     50,0     50
chr1:1888193 1/1      0/50        50      0/0     50,0     50

jav · Accepted Answer

Here's a data.table solution, with each line commented. It is written to handle any number of columns in your mymat object. I will explain briefly:

1) First, we convert to a data.table format where we can handle any number of columns, assuming it will be in a similar format.

2) We find all of the ".GT" columns and extract the number before the ".GT".

3) We create ".DP" columns for each ".GT" column found.

4) We develop a "GT" to "AD" mapping by creating a vector of the "to" part of the mapping. The "from" part is stored as names in the vector.

5) Use the .SDcols feature in the data.table to apply the "GT" to "AD" mapping, and create the "AD" columns.

# Your matrix
mymat <- structure(c("0/1", "1/1", "0/0", "0/0"), .Dim = c(2L, 2L), 
                   .Dimnames = list(c("chr1:1163804", "chr1:1888193"), 
                    c("00860.GT", "00861.GT")))

# Using a data table approach
library(data.table)

# Casting to data table - row.names will be converted to a column called 'rn'.
mymat = as.data.table(mymat, keep.rownames = T)

# Find "GT" columns
GTcols = grep("GT", colnames(mymat))

# Get number before ".GT"
selectedCols = gsub(".GT", "", colnames(mymat)[GTcols])

selectedCols
[1] "00860" "00861"

# Create ".DP" columns
mymat[, paste0(selectedCols, ".DP") := 50, with = F]

mymat
             rn 00860.GT 00861.GT 00860.DP 00861.DP
1: chr1:1163804      0/1      0/0       50       50
2: chr1:1888193      1/1      0/0       50       50

# Create "GT" to "AD" mapping
GTToADMapping = c("50,0", "25/25", "0/50")
names(GTToADMapping) = c("0/0", "0/1", "1/1")

GTToADMapping
0/0     0/1     1/1 
"50,0" "25/25"  "0/50" 

# This function will return the "AD" mapping given the values of "GT"
mapGTToAD <- function(x){
  return (GTToADMapping[x])
}

# Here, we create the AD columns using the GT mapping
mymat[, (paste0(selectedCols, ".AD")) := lapply(.SD, mapGTToAD), with = F,
        .SDcols = colnames(mymat)[GTcols]]

             rn 00860.GT 00861.GT 00860.DP 00861.DP 00860.AD 00861.AD
1: chr1:1163804      0/1      0/0       50       50    25/25     50,0
2: chr1:1888193      1/1      0/0       50       50     0/50     50,0

# We can sort the data now as you have it
colOrder = as.vector(rbind(paste0(selectedCols, ".GT"), 
                     paste0(selectedCols, ".AD"), 
                     paste0(selectedCols, ".DP")))
mymat = mymat[, c("rn", colOrder), with = F]

mymat
             rn 00860.GT 00860.AD 00860.DP 00861.GT 00861.AD 00861.DP
1: chr1:1163804      0/1    25/25       50      0/0     50,0       50
2: chr1:1888193      1/1     0/50       50      0/0     50,0       50

# Put it back in the format you had
mymat2 = as.matrix(mymat[,-1, with = F])
rownames(mymat2) = mymat$rn

mymat2
             00860.GT 00860.AD 00860.DP 00861.GT 00861.AD 00861.DP
chr1:1163804 "0/1"    "25/25"  "50"     "0/0"    "50,0"   "50"    
chr1:1888193 "1/1"    "0/50"   "50"     "0/0"    "50,0"   "50"

How to expand data matrix for corresponding column names

Answers (2)

Related Questions