Reputation: 704
I want to remove the -5p part of my data below:
[1] mmu-miR-322-5p mmu-miR-10b-5p mmu-miR-10a-5p
I tries this gsub(".-5p","",data)
but then it took away more than the 5p
Upvotes: 1
Views: 224
Reputation: 174696
.
would match the preceding character also. ie, your regex .-5p
would match 2-5p
,b-5p
, a-5p
in the input vector. Because the string -5p
is located at the last, you could use end of the line anchor $
after -5p
. Note that it would remove only the -5p
present at the last. If it's present at the middle or at first, then it won't do anything.
sub("-5p$","",data)
Example:
> s <- c("mmu-miR-322-5p", "mmu-miR-10b-5p", "mmu-miR-10a-5p")
> s
[1] "mmu-miR-322-5p" "mmu-miR-10b-5p" "mmu-miR-10a-5p"
> sub("-5p$","", s)
[1] "mmu-miR-322" "mmu-miR-10b" "mmu-miR-10a"
Upvotes: 2
Reputation: 70722
In regular expression implementation the dot .
is a metacharacter with special meaning. Alone, it will match any single character except a newline sequence, hence the reason it removes an extra character as expected.
Since you have one occurrence of -5p
in each of your vector elements, sub
is all that you need here.
> x <- c('mmu-miR-322-5p', 'mmu-miR-10b-5p', 'mmu-miR-10a-5p')
> sub('-5p', '', x)
# [1] "mmu-miR-322" "mmu-miR-10b" "mmu-miR-10a"
Upvotes: 1
Reputation: 902
You can even use substitute operator in perl one liner and remove -5p in all the cases from the input file using slurping:
Perl One liner Code:
perl -0777 -lne "if($_ =~ s/-5p//isg) { print $_;} else { print $_;}" InputFile
Upvotes: 1