Reputation: 377
I'm working with a file format that has lines of information that looks like this:
ATOM 1 N LYS A 56 20.508 14.774 -7.432 1.00 50.83 N
All i want is the first number, and the three numbers following '56' in the example above; so im using regular expressions to get that information. How do i then put that info into a hash?
So far i have:
my $pdb_file = $ARGV[0];
open (PDBFILE, "<$pdb_file") or die ("$pdb_file not found");
while (<PDBFILE>) {
if ($_=~ /^ATOM\s+(\d+)\s+\w+\s+\w+\s+\w+\s+\d+\s+(\d+\.\d+)\s+(\d+\.\d+)\s+(\d+\.\d+)/) {
my $atom = $1;
my $xcor = $2;
my $ycor = $3;
my $zcor = $4;
print "AtomNumber: $atom\t xyz: $xcor $ycor $zcor\n";
}
}
Upvotes: 0
Views: 59
Reputation: 165606
Instead of using a regex, I would instead recommend using split to split it into fields on whitespace. This will be faster and more robust, it doesn't depend on a detailed knowledge of the format of each field (which could change, like if a number has a minus sign which you forgot to take into account). And it's a lot easier to understand.
my @fields = split /\s+/, $line;
Then you can pick out the fields (for example, the first number is field 2, so $fields[1]
) and put them into your hash.
my %coordinate = (
atom => $fields[1],
x => $fields[6],
y => $fields[7],
z => $fields[8]
);
You're reading a bunch of lines, so you're going to make a bunch of hashes which have to go somewhere. I'd recommend putting them all in another hash with some sort of unique field as the key. Possibly the atom
field.
$atoms{$coordinate{atom}} = \%coordinate;
Upvotes: 4