Reputation: 201
I want to split a string into different columns. Each of the lines appears as the one below.
TR10052|c9_g13_i6_DESeqResultsBacterialen=248 gi|497816164|ref|WP_010130320.1| 97.56 82 2 0 1 246 9 90 7e-51 167
I can split by white space, tabs, and "|" but I'm having trouble splitting the rest of the first section "TR10052|c9_g13_i6_DESeqResultsBacterialen=248"
by a specific match of characters. I want the first column to be the TR##### piece, the second column to be the c#_g#_i# piece and the third column to be the rest of it starting with "_DESeq..." etc.
while ( my $line = <RESULTS> ) {
chomp $line;
my @column = split( /[\t|] /_DES.*/ /, $line );
my $transcriptID = $column[0];
my $isoform = $column[1];
my $deseq = $column[2];
}
Upvotes: 1
Views: 62
Reputation: 126752
It's easy to over-use split
. In this case I think it's better to extract the fields you want by writing a custom regex pattern.
Like this
use strict;
use warnings;
while ( <DATA> ) {
my ($transcript_id, $isoform, $deseq) = /^ ([^|]+) \| (c\d+_g\d+_i\d+) _ (\S+)/x;
print $_, "\n" for $transcript_id, $isoform, $deseq;
}
__DATA__
TR10052|c9_g13_i6_DESeqResultsBacterialen=248 gi|497816164|ref|WP_010130320.1| 97.56 82 2 0 1 246 9 90 7e-51 167
output
TR10052
c9_g13_i6
DESeqResultsBacterialen=248
Upvotes: 1
Reputation: 425298
Use a negative look ahead to split on underscores that are not followed by "letter digit".
Try splitting on this regex:
/\||\_(?![a-z]\d)|\s+/
See live regex demo matching the desired characters on which to split.
Upvotes: 3
Reputation: 559
Two splits might make it easier for you:
my ($transcriptID, $rest) = split(/\|/, $line, 2);
my ($isoform, $deseq) = split (/_DESeq/, $rest, 2);
$deseq = "_DESeq$deseq";
Transforms:
"TR10052|c9_g13_i6_DESeqResultsBacterialen=248 gi|497816164|ref|WP_010130320.1| 97.56 82 2 0 1 246 9 90 7e-51 167"
Into:
"TR10052", "c9_g13_i6", "_DESeqResultsBacterialen=248 gi|497816164|ref|WP_010130320.1| 97.56 82 2 0 1 246 9 90 7e-51 167"
Is that what you're looking for?
Upvotes: 2