Reputation: 3
I would like to read in a table then use gsub to return a part of the text. I know gsub requires a character vector format. Instead of getting the desired samp list of 'C516_A1_B1' and pat list of'C516' etc, I get'1:5'. What is the simplest way to fix this? Thanks!
bamlist <- read.table('pathtotxtfile.txt')
for (y in bamlist) {
samp <- gsub('EPICC_(C\\S+)_S1\\S+$','\\1', bamlist)
pat <- gsub('(C\\d+)_\\S+$','\\1', samp)
}
bamlist:
EPICC_C516_A1_B1_S1-GRCh38.bam
EPICC_C516_A1_G4_S1-GRCh38.bam
EPICC_C516_B1_G7_S1-GRCh38.bam
EPICC_C516_B1_G8_S1-GRCh38.bam
EPICC_C516_B3_B1_S1-GRCh38.bam
Upvotes: 0
Views: 62
Reputation: 76402
Why loop, sub
is vectorized over x
.
samp <- sub("^[^_]*_(.*)_[^_]*$", "\\1", bamlist)
pat <- sub("(^[^_]+)_.*$", "\\1", samp)
samp
#[1] "C516_A1_B1" "C516_A1_G4" "C516_B1_G7" "C516_B1_G8"
#[5] "C516_B3_B1"
pat
#[1] "C516" "C516" "C516" "C516" "C516"
Data
bamlist <- scan(what = character(), text = "
EPICC_C516_A1_B1_S1-GRCh38.bam
EPICC_C516_A1_G4_S1-GRCh38.bam
EPICC_C516_B1_G7_S1-GRCh38.bam
EPICC_C516_B1_G8_S1-GRCh38.bam
EPICC_C516_B3_B1_S1-GRCh38.bam
")
Following user @akrun's comment, here is a way to apply the above code to a data.frame.
lapply(bamlist, function(y){
samp <- sub("^[^_]*_(.*)_[^_]*$", "\\1", y)
pat <- sub("(^[^_]+)_.*$", "\\1", samp)
data.frame(samp = samp, pat = pat)
})
#$X
# samp pat
#1 C516_A1_B1 C516
#2 C516_A1_G4 C516
#3 C516_B1_G7 C516
#4 C516_B1_G8 C516
#5 C516_B3_B1 C516
The data would now be
X <- scan(what = character(), text = "
EPICC_C516_A1_B1_S1-GRCh38.bam
EPICC_C516_A1_G4_S1-GRCh38.bam
EPICC_C516_B1_G7_S1-GRCh38.bam
EPICC_C516_B1_G8_S1-GRCh38.bam
EPICC_C516_B3_B1_S1-GRCh38.bam
")
bamlist <- data.frame(X)
Upvotes: 2