Tim
Tim

Reputation: 214

Get the data in table from a website and output it to table format using perl

I have try several method from previous question on how to parse the table information from website like HTML::TableExtract and HTML::Parser but it does not work for me. Below is my code

my $browser = LWP::UserAgent->new( ssl_opts => { verify_hostname => 0, } );
my $url = 'http://reitdata.com/';

my $response = $browser->get($url);
die "Error at $url\n ", $response->status_line, "\n Aborting" unless $response->is_success;

my $te = HTML::TableExtract->new( headers => [qw(REIT PERIOD MKT DPU YIELD NAV GEARING ASSETS)]);
$te->parse($browser);

foreach my $ts ($te->tables) {
    print "Table (", join(',', $ts->coords), "):\n";
    foreach my $row ($ts->rows) {
    print join(',', @$row), "\n";

   }
}

The code above show no output. Any problem for the code to get the table information from the website? Additional, can i output the information get from website in table form? Below is the html code for the table.

<select name="ww" size="1" style="font-family: sans-serif; font-size: 9pt;" onchange="location.href = '/~sipesoft/cgi/sipesoft.cgi?report=ndashboard-'+ document.myform.family.value + ':' + document.myform.rpt.value + '*' + document.myform.ww.value"><option selected value="201730">201730&nbsp;&nbsp;</option>
<option value="201729">201729&nbsp;&nbsp;</option>
<option value="201728">201728&nbsp;&nbsp;</option>
<option value="201727">201727&nbsp;&nbsp;</option>
<option value="201726">201726&nbsp;&nbsp;</option>
<option value="201725">201725&nbsp;&nbsp;</option>
<option value="201724">201724&nbsp;&nbsp;</option>
<option value="201723">201723&nbsp;&nbsp;</option>
<option value="201722">201722&nbsp;&nbsp;</option>
</tr>
<tr>
<td><hr color="#000000" size="2"></td>
</tr>
<tr>
<td>
<table border=0 align=center cellspacing=0 cellpadding=0>
<tr>
<td>
<table border=1 align=left cellspacing=3 cellpadding=2>
<tr>
<td align="center" valign="bottom" bgcolor="#C0C0C0" width="45"><b><font face="Tahoma" size="1">Name</font></b></td>
<td align="center" valign="bottom" bgcolor="#C0C0C0" width="60"><b><font face="Tahoma" size="1">Age</font></b></td>
<td align="center" valign="bottom" bgcolor="#C0C0C0" width="40"><b><font face="Tahoma" size="1">Mark<br>Count</font></b></td>
<td align="center" valign="bottom" bgcolor="#C0C0C0" width="40"><b><font face="Tahoma" size="1">Grade</font></b></td>
<td align="center" valign="bottom" bgcolor="#C0C0C0" width="40"><b><font face="Tahoma" size="1">Hobby</font></b></td>
<td align="center" valign="bottom" bgcolor="#C0C0C0" width="40"><b><font face="Tahoma" size="1">Attendence</font></b></td> 
</tr>
</table>

Upvotes: 1

Views: 362

Answers (1)

zdim
zdim

Reputation: 66964

To get us on the same page, this is how we can pull tables from this page

use warnings;
use strict;
use feature 'say';

use LWP::UserAgent;
use HTML::TableExtract;

my $url = 'https://stackoverflow.com/q/45452726/4653379';

my $ua = LWP::UserAgent->new;    
my $response = $ua->get($url);
die "Error at $url\n ", $response->status_line if not $response->is_success;
my $page = $response->decoded_content;

my $te = HTML::TableExtract->new;
$te->parse($page);

foreach my $tbl ($te->tables) {
    say "Table (", join(',', $tbl->coords), ")";
}

with output

Table (1,0)
...
Table (0,3)

Here is a table from the url in the question, with a caveat.

use warnings;
use strict;
use open ':std', ':encoding(UTF-8)';

use LWP::UserAgent;
use HTML::TableExtract;
use Text::Table;

my $url = q(http://reitdata.com/);

my $ua = LWP::UserAgent->new;    
my $response = $ua->get($url);
my $page = $response->decoded_content;

my @headers = qw(REIT PERIOD MKT DPU YIELD NAV GEARING ASSETS);
my $te = HTML::TableExtract->new( headers => \@headers );
$te->parse($page);

my @data;
foreach my $tbl ( ($te->tables)[1] ) {  # just the second one
    foreach my $row ($tbl->rows) {
        my @row = map { s{^\s*|\s*$}{}gr } @$row;
        push @data, \@row;
    }   
}

my $tb = Text::Table->new( map { $_, \'  ' } @headers );   #'
$tb->load( @data );
print $tb;

The regex in the map block uses the non-destructive /r modifier, which returns the changed string (the original stays unchanged). We need v5.14.0 for it, or use map { s{..}{}g; $_ }.

The table is printed using Text::Table. The good old printf can do this job as well.

For more on table processing see this post, and this one with links, for example.

This prints

REIT              PERIOD      MKT     DPU      YIELD   NAV      GEARING  ASSETS                                                                                                                           
SoilbuildBizREIT  Q2 – Jun17  $0.710   1.4660  8.259%  $0.720   37.90%   Industrial (12) : Business Park 32% + Industrial 68% by NPI                                                                      
Cache Log Trust   Q2 – Jun17  $0.885   1.8000  8.158%  $0.770   43.40%   Industrial (19) : Singapore (83%) + Australia (16%) + China (1%) by Gross Revenue                                                
Viva Ind Tr       Q2 – Jun17  $0.925   1.861   8.069%  $0.790   39.10%   Industrial (9) : Biz Park (50.4%) + Light Industrial (23.4%) + Logistics (15.4%) + Hotel (10.8%) by NPI                          
EC World Reit     Q1 – Mar17  $0.775   1.5410  8.065%  $0.900   28.60%   Port, Warehouse & e-Commerce Infrastructure in China                                                                             
Lippo Malls Tr    Q1 – Mar17  $0.460   0.890   7.739%  $0.374   32.20%   Retail (Indonesia) – 20                                                                                                          
BHG Retail Reit   Q1 – Mar17  $0.735   1.3900  7.565%  $0.820   32.50%   Retail (China) – 5                                                                                                               
AIMSAMP Cap Reit  Q1 – Jun17  $1.440   2.500   7.500%  $1.386   36.30%   Industrial (27) : Singapore + Australia                                                                                          
IREIT Global      Q1 – Mar17  $0.790   1.4400  7.291%  $0.672   42.10%   Offices : Germany (5)                                                                                                            
Sabana REIT       Q2 – Jun17  $0.450   0.810   7.222%  $0.560   37.00%   Industrial (21)                                                                                                                  
ManulifeREIT USD  Q1 – Mar17  $0.920   1.6500  7.174%  $0.830   34.20%   Offices : USA (3)                                                                                                                
OUE Com Reit      Q1 – Mar17  $0.730   1.230   6.973%  $0.860   36.20%   Office (82.6%) + Retail (17.4%) ; Singapore (79.9%) + China (20.1%) by Revenue                                                   
OUE Htrust        Q1 – Mar17  $0.755   1.3000  6.887%  $0.760   38.10%   Hotel (78%) + Retail (22%) by NPI                                                                                                
Frasers Com Tr    Q3 – Jun17  $1.400   2.398   6.871%  $1.520   35.90%   Singapore (52.7%) + Australia (47.3%) by NPI                                                                                     
ESR-REIT          Q2 – Jun17  $0.565   0.9560  6.768%  $0.633   37.90%   Industrial (49)                                                                                                                  
Ascendas-hTrust   2H – Mar17  $0.840   3.010   6.762%  $0.920   32.20%   Hotels (11) : Australia (51%) + Japan (29%) + Singapore (14%) + China (6%) by NPI                                                
FHT               Q3 – Jun17  $0.740   1.2374  6.689%  $0.749   34.10%   Hotel (9) + Serviced Apt (6) : Australia (38%) + Singapore (20%) + UK (17%) + Japan (14%) + Malaysia (6%) + Germany (5%) by NPI  
Mapletree GCC Tr  Q1 – Jun17  $1.110   1.851   6.614%  $1.244   39.40%   Retail + Office : HK (69.4%) + China (30.6%) by NPI ; Retail (62%) + Office (36.5%) by NPI                                       
Ascott Reit       1H – Jun17  $1.190   3.3560  6.511%  $1.190   32.40%   Serviced Apts (73) : Asia Pacific (61.6%) + Europe (28.4%) + US (10%) by Assets                                                  
First REIT        Q2 – Jun17  $1.350   2.140   6.393%  $1.004   31.00%   Hospitals (13 – 1 in S Korea) + Hotel (Indonesia – 2) + Nursing Home (Singapore – 3)                                             
Mapletree Ind Tr  Q1 – Jun17  $1.855   2.9200  6.296%  $1.400   29.80%   Industrial (86)                                                                                                                  
Mapletree Log Tr  Q1 – Jun17  $1.200   1.887   6.290%  $1.020   39.00%   Industrial (127)                                                                                                                 
Far East HTrust   Q1 – Mar17  $0.670   0.9300  6.239%  $0.903   32.30%   Hotels (65.2%) + Commercial (23.1%) + Serviced Apts (11.7%) by Revenue                                                           
CapitaR China Tr  1H – Jun17  $1.660   5.360   6.078%  $1.520   35.30%   Retail (China) – 11                                                                                                              
Frasers L&I Tr    Q3 – Jun17  $1.095   1.7500  6.076%  $0.920   29.30%   Industrial (Australia) – 54                                                                                                      
StarhillGbl Reit  Q4 – Jun17  $0.780   1.180   6.064%  $0.910   35.30%   Retail + Office : Singapore (62.5%) + Australia (23.0%) + Malaysia (12.5%) + Others (2.0%) by Revenue                            
CDL Htrust        1H – Jun17  $1.600   4.1000  6.031%  $1.545   38.70%   Hotels : Singapore (58.1%) + Australia (10.2%) + Maldives (7.6%) + NZ (14.2%) + UK (6.1%) + Japan (3.7%) by NPI                  
Ascendas Reit     Q1 – Jun17  $2.700   4.049   5.811%  $2.040   33.90%   Industrial (132) : Singapore (86%) + Australia (14%) by Valuation                                                                
Keppel DC REIT    1H – Jun17  $1.280   3.6300  5.672%  $0.931   27.70%   Data Centres – 12 + 1 (Under Devt)                                                                                               
Frasers Cpt Tr    Q3 – Jun17  $2.100   3.000   5.593%  $1.920   30.00%   Retail (6) + 31.17% of Hektar (MREIT)                                                                                            
CapitaMall Trust  Q2 – Jun17  $2.010   2.7500  5.542%  $1.910   34.70%   Retail (16) + Office                                                                                                             
SPHREIT           Q3 – May17  $1.000   1.370   5.520%  $0.940   25.60%   Retail (2)                                                                                                                       
Mapletree Com Tr  Q1 – Jun17  $1.605   2.2300  5.495%  $1.370   36.40%   Retail + Office                                                                                                                  
CapitaCom Trust   1H – Jun17  $1.720   4.590   5.337%  $1.770   35.20%   Office (73%) + Retail (16%) + Hotel (11%) by Gross Rental Income                                                                 
Suntec Reit       Q2 – Jun17  $1.900   2.4930  5.289%  $2.094   36.10%   Office (69%) + Retail (28%) + Convention (3%) by Income                                                                          
Fortune Reit HKD  1H – Jun17  $9.720  25.530   5.253%  $13.390  28.40%   Retail (HK) – 17                                                                                                                 
Keppel Reit       Q2 – Jun17  $1.160   1.4200  4.897%  $1.400   38.50%   Office (8) : Singapore (89%) + Australia (11%) by Asset Value                                                                    
ParkwayLife Reit  Q2 – Jun17  $2.710   3.320   4.576%  $1.680   37.40%   Hospitals + Nursing Homes = 49 : Singapore 60% + Japan 40% by Gross Revenue                                                      
Saizen REIT       2H – Jun15  $0.033   2.930   0.000%  $1.210   35.00%   Residential (Japan) – 136 

The caveat: this isn't the second table on the page, but is the one in the section "July 2017." The module only sees the very first table and this one, what has to do with the web site. It is a separate issue which I have to leave for now.

Upvotes: 3

Related Questions