Ice
Ice

Reputation: 1971

Getting data from table ?

how to display data (Stock name, Capitals, Close Price, Market value)from the website in terminal? I have this website:

http://www.tpex.org.tw/web/stock/aftertrading/daily_mktval/mkt.php?l=en-us

, I create somethink.

    my $url = 'http://www.tpex.org.tw/web/stock/aftertrading/daily_mktval/mkt.php?l=en-us';


  use LWP::Simple;
  my $content = get $url;
  die "Couldn't get $url" unless defined $content;

But I don't really know how to use $content to print the data which I need.

I'll be grateful for each help :)

Upvotes: 0

Views: 106

Answers (2)

Borodin
Borodin

Reputation: 126722

You need to take a look at the excellent HTML::TableExtract module

Here's an example that uses the module to extract the data you require. I've used the URL for the printer-friendly version of the page for two reasons: the standard page uses JavaScript to build the table after it has been downloaded, so it isn't available to LWP::Simple which doesn't have JavaScript support; and it includes all the information on a single page, whereas the main page splits it up into many short sections

This is a far more robust, clear, and flexible technique than using regex patterns to parse HTML, which is generally a terrible idea

use strict;
use warnings 'all';

use LWP::Simple;
use HTML::TableExtract;

use open qw/ :std :encoding(utf-8) /;

use constant URL => 'http://www.tpex.org.tw/web/stock/aftertrading/daily_mktval/mkt_print.php?l=en-us';

my $content = get URL or die "Couldn't get " . URL;

my $te = HTML::TableExtract->new( headers => [
    qr/Stock\s+Name/,
    qr/Capitals/,
    qr/Close\s+Price/,
    qr/Market\s+Value/,
] );

$te->parse($content);

for my $row ( $te->rows ) {

    next unless $row->[0];        # Skip the final row with empty fields

    $_ = qq{"$_"} for $row->[0];  # Enclose the Stock Name in quotes
    tr/,//d for @{$row}[1,2,3];   # and remove commas from the numeric columns

    print join(',', @$row), "\n";
}

output

"OBI Pharma, Inc.",171199584,594.00,101692
"Vanguard International Semiconductor Co.",1638982267,53.90,88341
"Hermes Microvision, Inc.",71000000,1155.00,82005
"TaiMed Biologics Inc.",247732750,238.00,58960
"Phison Electronics Corp.",197373993,271.00,53488
"FamilyMart.co.,Ltd",223220000,202.00,45090
"WIN SEMICONDUCTORS CORP.",596666262,65.30,38962
"PChome online Inc.",99854871,368.50,36796
"TUNG THIH ELECTRONIC CO.,LTD.",84488699,435.00,36752
"ST.SHINE OPTICAL CO.,LTD",50416516,694.00,34989
"POYA CO.,LTD",95277388,350.00,33347
"SIMPLO TECHNOLOGY CO.,LTD.",308284198,108.00,33294
"LandMark Optoelectronics Corporation",69909752,474.50,33172
"Ginko International Co., Ltd.",92697472,340.00,31517
"GIGASOLAR MATERIALS CORPORATION",60989036,506.00,30860
"TTY Biopharm Company Limited",248649959,114.00,28346
"CHIPBOND TECHNOLOGY CORPORATION",649261998,41.90,27204
"Globalwafers.Co.,Ltd.",369250000,69.10,25515
"eMemory Technology lnc.",75782242,321.00,24326
"Parade Technology, Ltd.",76111677,315.50,24013
"PharmaEngine, Inc.",102101000,235.00,23993
"JIH SUN FINANCIAL HOLDING CO., LTD",3396302860,6.86,23298
...

Upvotes: 5

mkHun
mkHun

Reputation: 5927

Simple pattern matching and some trick enough for to do it.

In your task $content contain the whole text.

First, extract the table body content from the $content by using .+ with s flag. s flag helps to allow, match the any character with new line.

Second, split the extracted data by using </tr>.

Third, Iterate the foreach for the array then again will do pattern matching with grouping for extract the data.

Here $l1 and $l2 stores the rank and stock code. And the other data will be stored into the @arc variable

my $url = 'http://www.tpex.org.tw/web/stock/aftertrading/daily_mktval/mkt_print.php?l=en-us&d=2016/06/04&s=0,asc,0';
use LWP::Simple;
my $content = get $url;
die "Couldn't get $url" unless defined $content;


my ($table_body) = $content =~m/<tbody>(.+)<\/tbody>/s;

my @ar = split("</tr>",$table_body);

foreach my $lines(@ar)
{
    my ($l1,$l2,@arc) = $lines =~m/>(.+?)<\/td>/g;
    $, = "\t\t";
    print @arc,"\n";
}

Upvotes: 0

Related Questions