Reputation: 39
I have strings of DNA sequences such as: "ACGTTATATTTATGTTTTGGGATTTTAGCAGGAATGATTGGTACTGCTTTCAGTATGTTAATTAGATTAGAGTTATCGGGACCGGGATCAATGTTAGGGGATATCATTTATACAATGTTATTGTTACTGCTCATGCTTTTGTTATGATTTTTTTTTTAGTAATGCCTGTGATGATTGGGGGGTTTGGGAATTGGTTAGTACCATTATATATTGGTGCCCCAGATATGGCATTCCCTCGATTAAATAATATAAGTTTTTGATTATTACCGCCGGCTTTAAG"
Is there a way I can remove the letters at specific positions e.g. at position 20 in R?
I think I may be able to use regex but I don't think I am getting the expression right.
Thanks
Upvotes: 1
Views: 4760
Reputation: 887691
One option is to capture the characters until the 19, remove the 20th element and capture the remaining characterss
str2 <- sub("^(.{1,19}).(.*)", "\\1\\2", str1)
Or with a single capture group
sub("^(.{1,19}).", "\\1", str1)
Or another option is str_sub
library(stringr)
nchar(str1)
#[1] 280
str_sub(str1, 20, 20) <- ""
nchar(str1)
#[1] 279
str1 <- "ACGTTATATTTATGTTTTGGGATTTTAGCAGGAATGATTGGTACTGCTTTCAGTATGTTAATTAGATTAGAGTTATCGGGACCGGGATCAATGTTAGGGGATATCATTTATACAATGTTATTGTTACTGCTCATGCTTTTGTTATGATTTTTTTTTTAGTAATGCCTGTGATGATTGGGGGGTTTGGGAATTGGTTAGTACCATTATATATTGGTGCCCCAGATATGGCATTCCCTCGATTAAATAATATAAGTTTTTGATTATTACCGCCGGCTTTAAG"
Upvotes: 8
Reputation: 16178
Alternatively, without the use of regex
expression (and probably less straightforward tha @akrun's answer) you can use strsplit
to extract each character of your string as a sequence, remove the 20th, and paste them back together.
seq <- "ACGTTATATTTATGTTTTGGGATTTTAGCAGGAATGATTGGTACTGCTTTCAGTATGTTAATTAGATTAGAGTTATCGGGACCGGGATCAATGTTAGGGGATATCATTTATACAATGTTATTGTTACTGCTCATGCTTTTGTTATGATTTTTTTTTTAGTAATGCCTGTGATGATTGGGGGGTTTGGGAATTGGTTAGTACCATTATATATTGGTGCCCCAGATATGGCATTCCCTCGATTAAATAATATAAGTTTTTGATTATTACCGCCGGCTTTAAG"
nchar(seq)
[1] 280
seq2 <- paste(unlist(strsplit(seq,""))[-20], collapse = "")
nchar(seq2)
[1] 279
Upvotes: 4