hijfz
hijfz

Reputation: 39

Is there a way to remove a character by index from a string in R?

I have strings of DNA sequences such as: "ACGTTATATTTATGTTTTGGGATTTTAGCAGGAATGATTGGTACTGCTTTCAGTATGTTAATTAGATTAGAGTTATCGGGACCGGGATCAATGTTAGGGGATATCATTTATACAATGTTATTGTTACTGCTCATGCTTTTGTTATGATTTTTTTTTTAGTAATGCCTGTGATGATTGGGGGGTTTGGGAATTGGTTAGTACCATTATATATTGGTGCCCCAGATATGGCATTCCCTCGATTAAATAATATAAGTTTTTGATTATTACCGCCGGCTTTAAG"

Is there a way I can remove the letters at specific positions e.g. at position 20 in R?

I think I may be able to use regex but I don't think I am getting the expression right.

Thanks

Upvotes: 1

Views: 4760

Answers (2)

akrun
akrun

Reputation: 887691

One option is to capture the characters until the 19, remove the 20th element and capture the remaining characterss

str2 <- sub("^(.{1,19}).(.*)", "\\1\\2", str1)

Or with a single capture group

sub("^(.{1,19}).", "\\1", str1)

Or another option is str_sub

library(stringr)
nchar(str1)
#[1] 280
str_sub(str1, 20, 20) <- ""
nchar(str1)
#[1] 279

data

str1 <- "ACGTTATATTTATGTTTTGGGATTTTAGCAGGAATGATTGGTACTGCTTTCAGTATGTTAATTAGATTAGAGTTATCGGGACCGGGATCAATGTTAGGGGATATCATTTATACAATGTTATTGTTACTGCTCATGCTTTTGTTATGATTTTTTTTTTAGTAATGCCTGTGATGATTGGGGGGTTTGGGAATTGGTTAGTACCATTATATATTGGTGCCCCAGATATGGCATTCCCTCGATTAAATAATATAAGTTTTTGATTATTACCGCCGGCTTTAAG"

Upvotes: 8

dc37
dc37

Reputation: 16178

Alternatively, without the use of regex expression (and probably less straightforward tha @akrun's answer) you can use strsplit to extract each character of your string as a sequence, remove the 20th, and paste them back together.

seq <- "ACGTTATATTTATGTTTTGGGATTTTAGCAGGAATGATTGGTACTGCTTTCAGTATGTTAATTAGATTAGAGTTATCGGGACCGGGATCAATGTTAGGGGATATCATTTATACAATGTTATTGTTACTGCTCATGCTTTTGTTATGATTTTTTTTTTAGTAATGCCTGTGATGATTGGGGGGTTTGGGAATTGGTTAGTACCATTATATATTGGTGCCCCAGATATGGCATTCCCTCGATTAAATAATATAAGTTTTTGATTATTACCGCCGGCTTTAAG"

nchar(seq)
[1] 280

seq2 <- paste(unlist(strsplit(seq,""))[-20], collapse = "")
nchar(seq2)
[1] 279

Upvotes: 4

Related Questions