Reputation: 214
I have try several method from previous question on how to parse the table information from website like HTML::TableExtract
and HTML::Parser
but it does not work for me. Below is my code
my $browser = LWP::UserAgent->new( ssl_opts => { verify_hostname => 0, } );
my $url = 'http://reitdata.com/';
my $response = $browser->get($url);
die "Error at $url\n ", $response->status_line, "\n Aborting" unless $response->is_success;
my $te = HTML::TableExtract->new( headers => [qw(REIT PERIOD MKT DPU YIELD NAV GEARING ASSETS)]);
$te->parse($browser);
foreach my $ts ($te->tables) {
print "Table (", join(',', $ts->coords), "):\n";
foreach my $row ($ts->rows) {
print join(',', @$row), "\n";
}
}
The code above show no output. Any problem for the code to get the table information from the website? Additional, can i output the information get from website in table form? Below is the html code for the table.
<select name="ww" size="1" style="font-family: sans-serif; font-size: 9pt;" onchange="location.href = '/~sipesoft/cgi/sipesoft.cgi?report=ndashboard-'+ document.myform.family.value + ':' + document.myform.rpt.value + '*' + document.myform.ww.value"><option selected value="201730">201730 </option>
<option value="201729">201729 </option>
<option value="201728">201728 </option>
<option value="201727">201727 </option>
<option value="201726">201726 </option>
<option value="201725">201725 </option>
<option value="201724">201724 </option>
<option value="201723">201723 </option>
<option value="201722">201722 </option>
</tr>
<tr>
<td><hr color="#000000" size="2"></td>
</tr>
<tr>
<td>
<table border=0 align=center cellspacing=0 cellpadding=0>
<tr>
<td>
<table border=1 align=left cellspacing=3 cellpadding=2>
<tr>
<td align="center" valign="bottom" bgcolor="#C0C0C0" width="45"><b><font face="Tahoma" size="1">Name</font></b></td>
<td align="center" valign="bottom" bgcolor="#C0C0C0" width="60"><b><font face="Tahoma" size="1">Age</font></b></td>
<td align="center" valign="bottom" bgcolor="#C0C0C0" width="40"><b><font face="Tahoma" size="1">Mark<br>Count</font></b></td>
<td align="center" valign="bottom" bgcolor="#C0C0C0" width="40"><b><font face="Tahoma" size="1">Grade</font></b></td>
<td align="center" valign="bottom" bgcolor="#C0C0C0" width="40"><b><font face="Tahoma" size="1">Hobby</font></b></td>
<td align="center" valign="bottom" bgcolor="#C0C0C0" width="40"><b><font face="Tahoma" size="1">Attendence</font></b></td>
</tr>
</table>
Upvotes: 1
Views: 362
Reputation: 66964
To get us on the same page, this is how we can pull tables from this page
use warnings;
use strict;
use feature 'say';
use LWP::UserAgent;
use HTML::TableExtract;
my $url = 'https://stackoverflow.com/q/45452726/4653379';
my $ua = LWP::UserAgent->new;
my $response = $ua->get($url);
die "Error at $url\n ", $response->status_line if not $response->is_success;
my $page = $response->decoded_content;
my $te = HTML::TableExtract->new;
$te->parse($page);
foreach my $tbl ($te->tables) {
say "Table (", join(',', $tbl->coords), ")";
}
with output
Table (1,0) ... Table (0,3)
Here is a table from the url in the question, with a caveat.
use warnings;
use strict;
use open ':std', ':encoding(UTF-8)';
use LWP::UserAgent;
use HTML::TableExtract;
use Text::Table;
my $url = q(http://reitdata.com/);
my $ua = LWP::UserAgent->new;
my $response = $ua->get($url);
my $page = $response->decoded_content;
my @headers = qw(REIT PERIOD MKT DPU YIELD NAV GEARING ASSETS);
my $te = HTML::TableExtract->new( headers => \@headers );
$te->parse($page);
my @data;
foreach my $tbl ( ($te->tables)[1] ) { # just the second one
foreach my $row ($tbl->rows) {
my @row = map { s{^\s*|\s*$}{}gr } @$row;
push @data, \@row;
}
}
my $tb = Text::Table->new( map { $_, \' ' } @headers ); #'
$tb->load( @data );
print $tb;
The regex in the map
block uses the non-destructive /r
modifier, which returns the changed string (the original stays unchanged). We need v5.14.0 for it, or use map { s{..}{}g; $_ }
.
The table is printed using Text::Table. The good old printf
can do this job as well.
For more on table processing see this post, and this one with links, for example.
This prints
REIT PERIOD MKT DPU YIELD NAV GEARING ASSETS SoilbuildBizREIT Q2 – Jun17 $0.710 1.4660 8.259% $0.720 37.90% Industrial (12) : Business Park 32% + Industrial 68% by NPI Cache Log Trust Q2 – Jun17 $0.885 1.8000 8.158% $0.770 43.40% Industrial (19) : Singapore (83%) + Australia (16%) + China (1%) by Gross Revenue Viva Ind Tr Q2 – Jun17 $0.925 1.861 8.069% $0.790 39.10% Industrial (9) : Biz Park (50.4%) + Light Industrial (23.4%) + Logistics (15.4%) + Hotel (10.8%) by NPI EC World Reit Q1 – Mar17 $0.775 1.5410 8.065% $0.900 28.60% Port, Warehouse & e-Commerce Infrastructure in China Lippo Malls Tr Q1 – Mar17 $0.460 0.890 7.739% $0.374 32.20% Retail (Indonesia) – 20 BHG Retail Reit Q1 – Mar17 $0.735 1.3900 7.565% $0.820 32.50% Retail (China) – 5 AIMSAMP Cap Reit Q1 – Jun17 $1.440 2.500 7.500% $1.386 36.30% Industrial (27) : Singapore + Australia IREIT Global Q1 – Mar17 $0.790 1.4400 7.291% $0.672 42.10% Offices : Germany (5) Sabana REIT Q2 – Jun17 $0.450 0.810 7.222% $0.560 37.00% Industrial (21) ManulifeREIT USD Q1 – Mar17 $0.920 1.6500 7.174% $0.830 34.20% Offices : USA (3) OUE Com Reit Q1 – Mar17 $0.730 1.230 6.973% $0.860 36.20% Office (82.6%) + Retail (17.4%) ; Singapore (79.9%) + China (20.1%) by Revenue OUE Htrust Q1 – Mar17 $0.755 1.3000 6.887% $0.760 38.10% Hotel (78%) + Retail (22%) by NPI Frasers Com Tr Q3 – Jun17 $1.400 2.398 6.871% $1.520 35.90% Singapore (52.7%) + Australia (47.3%) by NPI ESR-REIT Q2 – Jun17 $0.565 0.9560 6.768% $0.633 37.90% Industrial (49) Ascendas-hTrust 2H – Mar17 $0.840 3.010 6.762% $0.920 32.20% Hotels (11) : Australia (51%) + Japan (29%) + Singapore (14%) + China (6%) by NPI FHT Q3 – Jun17 $0.740 1.2374 6.689% $0.749 34.10% Hotel (9) + Serviced Apt (6) : Australia (38%) + Singapore (20%) + UK (17%) + Japan (14%) + Malaysia (6%) + Germany (5%) by NPI Mapletree GCC Tr Q1 – Jun17 $1.110 1.851 6.614% $1.244 39.40% Retail + Office : HK (69.4%) + China (30.6%) by NPI ; Retail (62%) + Office (36.5%) by NPI Ascott Reit 1H – Jun17 $1.190 3.3560 6.511% $1.190 32.40% Serviced Apts (73) : Asia Pacific (61.6%) + Europe (28.4%) + US (10%) by Assets First REIT Q2 – Jun17 $1.350 2.140 6.393% $1.004 31.00% Hospitals (13 – 1 in S Korea) + Hotel (Indonesia – 2) + Nursing Home (Singapore – 3) Mapletree Ind Tr Q1 – Jun17 $1.855 2.9200 6.296% $1.400 29.80% Industrial (86) Mapletree Log Tr Q1 – Jun17 $1.200 1.887 6.290% $1.020 39.00% Industrial (127) Far East HTrust Q1 – Mar17 $0.670 0.9300 6.239% $0.903 32.30% Hotels (65.2%) + Commercial (23.1%) + Serviced Apts (11.7%) by Revenue CapitaR China Tr 1H – Jun17 $1.660 5.360 6.078% $1.520 35.30% Retail (China) – 11 Frasers L&I Tr Q3 – Jun17 $1.095 1.7500 6.076% $0.920 29.30% Industrial (Australia) – 54 StarhillGbl Reit Q4 – Jun17 $0.780 1.180 6.064% $0.910 35.30% Retail + Office : Singapore (62.5%) + Australia (23.0%) + Malaysia (12.5%) + Others (2.0%) by Revenue CDL Htrust 1H – Jun17 $1.600 4.1000 6.031% $1.545 38.70% Hotels : Singapore (58.1%) + Australia (10.2%) + Maldives (7.6%) + NZ (14.2%) + UK (6.1%) + Japan (3.7%) by NPI Ascendas Reit Q1 – Jun17 $2.700 4.049 5.811% $2.040 33.90% Industrial (132) : Singapore (86%) + Australia (14%) by Valuation Keppel DC REIT 1H – Jun17 $1.280 3.6300 5.672% $0.931 27.70% Data Centres – 12 + 1 (Under Devt) Frasers Cpt Tr Q3 – Jun17 $2.100 3.000 5.593% $1.920 30.00% Retail (6) + 31.17% of Hektar (MREIT) CapitaMall Trust Q2 – Jun17 $2.010 2.7500 5.542% $1.910 34.70% Retail (16) + Office SPHREIT Q3 – May17 $1.000 1.370 5.520% $0.940 25.60% Retail (2) Mapletree Com Tr Q1 – Jun17 $1.605 2.2300 5.495% $1.370 36.40% Retail + Office CapitaCom Trust 1H – Jun17 $1.720 4.590 5.337% $1.770 35.20% Office (73%) + Retail (16%) + Hotel (11%) by Gross Rental Income Suntec Reit Q2 – Jun17 $1.900 2.4930 5.289% $2.094 36.10% Office (69%) + Retail (28%) + Convention (3%) by Income Fortune Reit HKD 1H – Jun17 $9.720 25.530 5.253% $13.390 28.40% Retail (HK) – 17 Keppel Reit Q2 – Jun17 $1.160 1.4200 4.897% $1.400 38.50% Office (8) : Singapore (89%) + Australia (11%) by Asset Value ParkwayLife Reit Q2 – Jun17 $2.710 3.320 4.576% $1.680 37.40% Hospitals + Nursing Homes = 49 : Singapore 60% + Japan 40% by Gross Revenue Saizen REIT 2H – Jun15 $0.033 2.930 0.000% $1.210 35.00% Residential (Japan) – 136
The caveat: this isn't the second table on the page, but is the one in the section "July 2017." The module only sees the very first table and this one, what has to do with the web site. It is a separate issue which I have to leave for now.
Upvotes: 3