KidSudi
KidSudi

Reputation: 492

Obtaining table data in Perl

I am trying to obtain the contents of the large table in the following webpage: http://www.basketball-reference.com/players/j/jamesle01/gamelog/2013/. I want to then save the contents to a spreadsheet. All of this is to be done in Perl. I'm not really sure how to proceed with this. Any help would be greatly appreciated.

Also, if you take a look above the large table, you can click on CSV, which I believe (possibly?) would make it easier for me to obtain the table data and put it into an Excel spreadsheet. Any advice on this?

Thanks

Upvotes: 0

Views: 206

Answers (2)

Sinan Ünür
Sinan Ünür

Reputation: 118166

Once you have the HTML file locally, you can parse it using HTML::TableExtract and import the tab-separated file into Excel:

#!/usr/bin/env perl

use utf8;
use v5.12;
use strict;
use warnings;
use warnings qw(FATAL utf8);
use open qw(:std :utf8);

# see http://www.perl.com/pub/2012/04/perlunicook-standard-preamble.html

use File::Slurp qw( read_file );
use HTML::TableExtract;

my $content = read_file 'index.html', binmode => ':utf8';

my $te = HTML::TableExtract->new(attribs => {id => 'pgl_basic'});

$te->parse($content);
my ($table) = $te->tables;

for my $row ($table->rows) {
    next if $row->[0] eq 'Rk';
    print join("\t", map { defined($_) ? $_ : '' } @$row), "\n";
}

Upvotes: 1

PaulProgrammer
PaulProgrammer

Reputation: 17700

If you can get the data as a CSV, you can open it directly in Excel, no transformation required.

Parsing HTML is tricky and error prone, because what constitutes valid HTML can be quite ugly.

If you really need to write an XLS file, after reading the CSV (for instance, by using Text::CSV), you can write to the binary XLS file format using something like Spreadsheet::WriteExcel

Note, I've used Text::CSV before -- it's pretty reasonable. I have no experience with WriteExcel.

Upvotes: 1

Related Questions