Hash
Hash

Reputation: 57

Split and join strings in a column using R

row.names X33 X40 X46 X50 X60 X80 X90 X100 X130 X200
INTERGENIC-chrI-188881-G-C  0.37    0.41    0.48    0.45    0.47    0.42    0.45    0.44    0.40    0.36    0.36    0.33    0.39
INTERGENIC-chrI-188939-A-G  0.38    0.48    0.56    0.54    0.57    0.45    0.57    0.51    0.49    0.47    0.41    0.38    0.52
INTERGENIC-chrXIII-191990-A-T   0.14    0.15    0.15    0.22    0.16    0.16    0.15    0.31    0.11    0.23    0.12    0.12    0.19
SDS3-chrIX-202625-T-G   0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.33    0.35    0.41    0.50    0.53
INTERGENIC-chrVIII-236987-G-A   0.15    0.34    0.28    0.28    0.22    0.12    0.15    0.25    0.27    0.24    0.13    0.13    0.25
INTERGENIC-chrVIII-236993-T-A   0.12    0.25    0.21    0.19    0.17    0.00    0.00    0.18    0.16    0.17    0.08    0.11    0.12

I have a matrix [34,11] with column names:

row.names X33 X40 X46 X50 X60 X80 X90 X100 X130 X200 

The first columnrow.names needs to be split into individual columns:

GENE, CHROMOSOME, POSITION, REF, ALT (INTERGENIC-chrI-188881-G-C)

I am currently using R to get the desired output.

library(reshape2)    
A <- colsplit(df$row.names, "\\-", names=c("GENE", "CHROMOSOME", "POSITION", "REF", "ALT"))

I would like to perform this operation for all individual rows in the matrix. Could any one help me output a matrix as above with the new columns added. Thanks

Upvotes: 1

Views: 426

Answers (1)

zx8754
zx8754

Reputation: 56149

Try this:

#dummy data
df <- read.table(text="
Variants    33  40  46  50  60  80  90  100 130 200
ASG1-chrIX-103160-C-T   0   0   0   0   0   0.83    0.49    0   0   0
YNL179C-chrXIV-300948-T-A   0.27    0.32    0.24    0.25    0.23    0.22    0.17    0.16    0.2 0.3
",header=TRUE)

#split columns
temp_df <- 
  do.call(rbind,
          strsplit(as.character(df$Variants),split="-"))
colnames(temp_df) <- c("GENE", "CHROMOSOME", "POSITION", "REF", "ALT")

#result
cbind(temp_df,df[,-1])

# GENE CHROMOSOME POSITION REF ALT  X33  X40  X46  X50  X60  X80  X90 X100 X130 X200
# 1    ASG1      chrIX   103160   C   T 0.00 0.00 0.00 0.00 0.00 0.83 0.49 0.00  0.0  0.0
# 2 YNL179C     chrXIV   300948   T   A 0.27 0.32 0.24 0.25 0.23 0.22 0.17 0.16  0.2  0.3

EDIT:

The code your provided works, too:

#using reshape2
library(reshape2)
A <- colsplit(df$Variants, "\\-", names=c("GENE", "CHROMOSOME", "POSITION", "REF", "ALT"))

#result
cbind(A,df[,-1])

Upvotes: 1

Related Questions