Reputation: 64034
I have the following sentence:
zzzzzzz microRNA146a xxx (miR-146a, mir-33c) xxxx wwwwww Breast Cancer zzzz mir-33c kkk
What I want to do is to tag the words/phrases in that sentence based on some predefined regular expression rule. In the end it looks like this:
zzzzzzz [microRNA146a]<MIR-0> xxx ([miR-146a]<MIR-1>, [mir-33c]<MIR-2>) xxxx wwwwww [Breast Cancer] <CANCER-0> zzzz [mir-33c]<MIR-2> kkk.
Note that in the above output each words/phrases that satisfy the rules are indexed by the order it occur.
I'm stuck with the following code. What's the right way to do it?
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my $text = 'zzzzzzz microRNA146a xxx (miR-146a, mir-33c) xxxx wwwwww Breast Cancer zzzz';
# Rule 1 for miRNA definition
my @mirlist = ($text =~ /( mir-\d+\w+| microRNA\d+)/xgi);
# Rule 2 for special words/phrases
my @spec = ($text =~ /(Breast Cancer)/gi);
# These arrays already preserve the order of occurrence
print Dumper \@mirlist ;
print Dumper \@spec ;
# Not sure how to proceed from here
*Update:*Add the re-occuring miRNA and refine the desired answer.
Upvotes: 1
Views: 479
Reputation: 19528
Using your own dump with a simple for
to iterate with the 2 arrays:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $text = 'zzzzzzz microRNA146a xxx (miR-146a, mir-33c) xxxx microRNA146a wwwwww Breast Cancer aaaa Breast Cancer zzzz mir-33c kkk';
# Rule 1 for miRNA definition
my $i = 0;
$text =~ s/(mir-\d\w+|microrna\d+\w?)/"[$1]<MIR-" . $i++ . ">"/gie;
# Rule 2 for special words/phrases
my $j = 0;
$text =~ s/(breast cancer)/"[$1]<CANCER-" . $j++ . ">"/gie;
print $text;
Upvotes: 2