Reputation: 2635
I have this file - all I need is the last five lines from the file. I know that I am not supposed to parse html without a html module. but this is not really like a program strict - I mean all I really need is the last five lines or so. Besides I cannot download any modules. I do have access to the proxy server which allows me to curl files from the command line so maybe there is a way to use cpan fromteh or through the proxy - but that is a nother matter. the matter at hand is that when I parse out thelast file lines or so, I don't get the "Names IN MY-DEPT that are restricted" and I want it. it gets skipped.
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$ cat restricted.html.bak
To:DL-BANK@big_business.com
From:dl-dept?g-gsd-stm@big_business.com
Subject: Restricted List for 25-Nov-2014
Content-Type: text/html;
Content-Transfer-Encoding: quoted-print HTMLFILEable>
<HTML>
<HEAD>
<STYLE type="text/css">
body { font-family: verdana; font-size: 10pt }
td { font-size: 8pt; vertical-align: top }
td.cat { color: 6699FF ; background: 666699; text-align: right; vertical-align: bottom; height: 30 }
td.ind { width: 20pt }
td.link { }
td.desc { color: a0a0a0 }
a:visited { color: 800080; text-decoration: none }
</STYLE>
<TITLE>TRADES</TITLE>
</HEAD><BODY><TABLE width="80%" border="0" cellpadding="0" cellspacing="0">
<tr>
<td colspan="3" align="center">Names IN MY-DEPT that are restricted</td>
</tr>
<tr>
<td><b>Restriction Code</b></td>
<td><b>Company</b></td>
<td><b>Ticker</b></td>
</tr><tr><td>RL5</td><td>First Trust Global Risk Managed Inc</td><td>ETP</td></tr><font color="red"><tr><td>RLMT</td><td>GT Advanced Technologies Inc</td><td nowrap>GTATQ (position only, not in MY-DEPT)</td></tr></font></TABLE></BODY</HTML>new_guy@casper0170foo:~/hey/hit_BANK_restricted.$
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$ cat parse_restrict2
#!/usr/bin/perl
use strict;
use warnings ;
my @restrict_codes = qw(RL3 RL5 RL5H RL6 REGM RAF RLMT RTCA RTCAH RTCB RTCBH RTCI RTCIH RLSI RLHK RLJP RPROP RLCB RLCS RLBZ RLBZH RLSUS);
my $rest_dir = "/home/new_guy/hey/hit_BANK_restricted./";
my $restrict_file = "restricted.html.bak" ;
open my $fh_rest_codes, '<', "$rest_dir$restrict_file" or die "cannot load $! " ;
while (<$fh_rest_codes>) {
next unless $_ =~ m/Names/;
my @lines = <$fh_rest_codes> ;
}
foreach(@lines) {
s/td/ /g ;
s/<[^>]*>/ /g ;
foreach $restrict(@restrict_codes) {
s/$restrict/\n$restrict/g;
}
print $_ ;
sleep 1 ;
}
print "\n" ;
These are the results that I get: They are Ok but I would like to format them and I do not know how.
new_gue@casper0170foo:~/hey/hit_BANK_restricted.$ cat parse_restrict^C
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$ ./parse_restrict2
Restriction Code
Company
Ticker
RL5 First Trust Global Risk Managed Inc ETP
RLMT GT Advanced Technologies Inc GTATQ (position only, not in MY-DEPT)
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$
new_guy@casper0170foo:~/hey/hit_BANK_restricted.$
Would there be any way possible to get the lines in this kind of format.
Names IN MY-DEPT that are restricted
Restriction Code Company Ticker
RL5 First Trust Global Risk Managed Inc ETP
RLMT GT Advanced Technologies Inc GTATQ (position only, not in MY-DEPT)
Upvotes: 0
Views: 53
Reputation: 40748
Good question, you could try this workaround if you like:
my @lines;
while (<$fh_rest_codes>) {
next unless $_ =~ m/Names/;
push(@lines, $_);
push (@lines, <$fh_rest_codes>);
}
my $str=join ('',@lines);
$str=~m|<td.*?>(.*?)</td>|;
print "$1\n\n";
$str=~ m|<tr>(.*?)</tr>|msg;
my $fmt="%-24s%-40s%-40s\n";
printf ($fmt, $1=~ m{<td><b>(.*?)</b></td>}msg );
while ($str=~ m|<tr>(.*?)</tr>|msg) {
printf ($fmt, $1=~ m{<td.*?>(.*?)</td>}msg );
}
Output:
Names IN MY-DEPT that are restricted
Restriction Code Company Ticker
RL5 First Trust Global Risk Managed Inc ETP
RLMT GT Advanced Technologies Inc GTATQ (position only, not in MY-DEPT)
Upvotes: 1