Reputation: 9
How to display only the chains (such as A, C, E, G
) which end with a semicolon ;
Data
COMPND MOL_ID: 1;
COMPND 2 MOLECULE: JACALIN;
COMPND 3 CHAIN: A, C, E, G;
COMPND 4 SYNONYM: JACKFRUIT
AGGLUTININ;
COMPND 5 MOL_ID: 2;
COMPND 6 MOLECULE: JACALIN;
COMPND 7 CHAIN: B, D, F, H;
COMPND 8 SYNONYM: JACKFRUIT AGGLUTININ
I tried the below code
#!usr/local/bin/perl
open(FILE, "/home/httpd/cgi-bin/r/1JAC.pdb");
while ( $line = <FILE> ) {
if ( $line =~ /^COMPND/ ) {
#$line = substr $line,4,21;
my $line =~ m(/\$:^\w+\$\;/g);
print $line;
}
}
Upvotes: 0
Views: 131
Reputation: 247072
Using GNU grep with perl regular expressions: find the text between "CHAIN:" and the semicolon
$ grep -oP '(?<=CHAIN: ).*?(?=;)' filename
A, C, E, G
B, D, F, H
Upvotes: 0
Reputation: 16354
You can use a single regular expression like the following:
while (my $line = <FILE>) {
if ($line =~ /^COMPND.+?CHAIN:\s*(.*?)\s*;\s*$/) {
my $chain = $1;
print "$chain\n";
}
}
This uses a regular expression to match COMPND, CHAIN and an ending ;
. The \s*
at the end of the regular expression will match any trailing spaces. It will capture the string between CHAIN:
and ;
excluding trailing and leading spaces in $1
which is set as the value for the $chain
variable.
More information on Perldoc: Perlre - Perl regular expressions.
Upvotes: 1
Reputation: 5925
Try this
use warnings;
use strict;
open my $nis,"<1jac.pdb";
my @ar = grep{ m/^COMPND/g} <$nis>;
my $s = join("",@ar);
my @dav;
my @mp2 = map{split(/,\s|,/, $_)} grep{ s/(COMPND\s+\d+\s+(CHAIN\:\s+)?)|(\n|;)//g} @dav= $s =~m/(COMPND\s+\d+\s+CHAIN\:.+?(?:.|\n)+?\;)/g;
$, = ", ";
print @mp2;
Output
A, C, E, G, B, D, F, H
Upvotes: -1
Reputation: 126742
You may like this one-line solution
perl -le 'print for map /CHAIN:\s*([^;]+)/, <>' /home/httpd/cgi-bin/r/1JAC.pdb
output
A, C, E, G
B, D, F, H
Upvotes: 0
Reputation: 67910
perl -nle'print $1 if /^COMPND\s+\S*\s*CHAIN:(.+);/' /home/httpd/cgi-bin/r/1JAC.pdb
This is a fairly simple method of "grepping" part of a line to standard output. It will capture everything in the parentheses and print it.
-n
uses a while(<>)
loop to read data from your file-l
handles newlinesUpvotes: 2