BioMan
BioMan

Reputation: 704

remove part of string - regex

I want to remove the -5p part of my data below:

[1] mmu-miR-322-5p mmu-miR-10b-5p mmu-miR-10a-5p

I tries this gsub(".-5p","",data) but then it took away more than the 5p

Upvotes: 1

Views: 224

Answers (3)

Avinash Raj
Avinash Raj

Reputation: 174696

. would match the preceding character also. ie, your regex .-5p would match 2-5p,b-5p, a-5p in the input vector. Because the string -5p is located at the last, you could use end of the line anchor $ after -5p. Note that it would remove only the -5p present at the last. If it's present at the middle or at first, then it won't do anything.

sub("-5p$","",data)

Example:

> s <- c("mmu-miR-322-5p", "mmu-miR-10b-5p", "mmu-miR-10a-5p")
> s
[1] "mmu-miR-322-5p" "mmu-miR-10b-5p" "mmu-miR-10a-5p"
> sub("-5p$","", s)
[1] "mmu-miR-322" "mmu-miR-10b" "mmu-miR-10a"

Upvotes: 2

hwnd
hwnd

Reputation: 70722

In regular expression implementation the dot . is a metacharacter with special meaning. Alone, it will match any single character except a newline sequence, hence the reason it removes an extra character as expected.

Since you have one occurrence of -5p in each of your vector elements, sub is all that you need here.

> x <- c('mmu-miR-322-5p', 'mmu-miR-10b-5p', 'mmu-miR-10a-5p')
> sub('-5p', '', x)
# [1] "mmu-miR-322" "mmu-miR-10b" "mmu-miR-10a"

Upvotes: 1

Praveen
Praveen

Reputation: 902

You can even use substitute operator in perl one liner and remove -5p in all the cases from the input file using slurping:

Perl One liner Code:

perl -0777 -lne "if($_ =~ s/-5p//isg) { print $_;} else { print $_;}" InputFile 

Upvotes: 1

Related Questions