Reputation: 1971
how to display data (Stock name, Capitals, Close Price, Market value)from the website in terminal? I have this website:
http://www.tpex.org.tw/web/stock/aftertrading/daily_mktval/mkt.php?l=en-us
, I create somethink.
my $url = 'http://www.tpex.org.tw/web/stock/aftertrading/daily_mktval/mkt.php?l=en-us';
use LWP::Simple;
my $content = get $url;
die "Couldn't get $url" unless defined $content;
But I don't really know how to use $content
to print the data which I need.
I'll be grateful for each help :)
Upvotes: 0
Views: 106
Reputation: 126722
You need to take a look at the excellent HTML::TableExtract
module
Here's an example that uses the module to extract the data you require. I've used the URL for the printer-friendly version of the page for two reasons: the standard page uses JavaScript to build the table after it has been downloaded, so it isn't available to LWP::Simple
which doesn't have JavaScript support; and it includes all the information on a single page, whereas the main page splits it up into many short sections
This is a far more robust, clear, and flexible technique than using regex patterns to parse HTML, which is generally a terrible idea
use strict;
use warnings 'all';
use LWP::Simple;
use HTML::TableExtract;
use open qw/ :std :encoding(utf-8) /;
use constant URL => 'http://www.tpex.org.tw/web/stock/aftertrading/daily_mktval/mkt_print.php?l=en-us';
my $content = get URL or die "Couldn't get " . URL;
my $te = HTML::TableExtract->new( headers => [
qr/Stock\s+Name/,
qr/Capitals/,
qr/Close\s+Price/,
qr/Market\s+Value/,
] );
$te->parse($content);
for my $row ( $te->rows ) {
next unless $row->[0]; # Skip the final row with empty fields
$_ = qq{"$_"} for $row->[0]; # Enclose the Stock Name in quotes
tr/,//d for @{$row}[1,2,3]; # and remove commas from the numeric columns
print join(',', @$row), "\n";
}
"OBI Pharma, Inc.",171199584,594.00,101692
"Vanguard International Semiconductor Co.",1638982267,53.90,88341
"Hermes Microvision, Inc.",71000000,1155.00,82005
"TaiMed Biologics Inc.",247732750,238.00,58960
"Phison Electronics Corp.",197373993,271.00,53488
"FamilyMart.co.,Ltd",223220000,202.00,45090
"WIN SEMICONDUCTORS CORP.",596666262,65.30,38962
"PChome online Inc.",99854871,368.50,36796
"TUNG THIH ELECTRONIC CO.,LTD.",84488699,435.00,36752
"ST.SHINE OPTICAL CO.,LTD",50416516,694.00,34989
"POYA CO.,LTD",95277388,350.00,33347
"SIMPLO TECHNOLOGY CO.,LTD.",308284198,108.00,33294
"LandMark Optoelectronics Corporation",69909752,474.50,33172
"Ginko International Co., Ltd.",92697472,340.00,31517
"GIGASOLAR MATERIALS CORPORATION",60989036,506.00,30860
"TTY Biopharm Company Limited",248649959,114.00,28346
"CHIPBOND TECHNOLOGY CORPORATION",649261998,41.90,27204
"Globalwafers.Co.,Ltd.",369250000,69.10,25515
"eMemory Technology lnc.",75782242,321.00,24326
"Parade Technology, Ltd.",76111677,315.50,24013
"PharmaEngine, Inc.",102101000,235.00,23993
"JIH SUN FINANCIAL HOLDING CO., LTD",3396302860,6.86,23298
...
Upvotes: 5
Reputation: 5927
Simple pattern matching and some trick enough for to do it.
In your task $content
contain the whole text.
First, extract the table body content from the $content
by using .+
with s
flag. s
flag helps to allow, match the any character with new line.
Second, split the extracted data by using </tr>
.
Third, Iterate the foreach for the array then again will do pattern matching with grouping for extract the data.
Here $l1
and $l2
stores the rank
and stock code
. And the other data will be stored into the @arc
variable
my $url = 'http://www.tpex.org.tw/web/stock/aftertrading/daily_mktval/mkt_print.php?l=en-us&d=2016/06/04&s=0,asc,0';
use LWP::Simple;
my $content = get $url;
die "Couldn't get $url" unless defined $content;
my ($table_body) = $content =~m/<tbody>(.+)<\/tbody>/s;
my @ar = split("</tr>",$table_body);
foreach my $lines(@ar)
{
my ($l1,$l2,@arc) = $lines =~m/>(.+?)<\/td>/g;
$, = "\t\t";
print @arc,"\n";
}
Upvotes: 0