Ash_and_Perl
Ash_and_Perl

Reputation: 358

Issues with Spreadsheet::Read

In my application, we are using Spreadsheet::Read to read a excel and perform some task on the rows and finally add it to a database.

The file we are importing is an Excel file ( .XLSX). This excel file is actually a glossary which serves support to different user languages.

The problem, I am facing with this process is we have special character cells in some of the rows/columns which are not getting decoded correctly.

For example, if i have a SPANISH Excel FIle:


IN THE EXCEL SHEET => Extract from the logs


Información de cuenta => Informaci\x{f3}n de cuenta

Página de consola de administración de curso => P\x{e1}gina de consola de administraci\x{f3}n de curso

Informaci\x{f3}n de cuenta is getting added in the Db and when fetched it is displaying extraneous characters in the UI.

I tried this solution but it is not working. this is basically Hacking of Spreadsheet::Read

use Text::Iconv;
package Spreadsheet::XLSX;

sub new {
    my $converter = Text::Iconv->new("ASCII","utf-8");
    return __PACKAGE__->SUPER::new(@_, $converter);
}

Please suggest me what is wrong or any better solution ?

Upvotes: 2

Views: 1621

Answers (1)

daxim
daxim

Reputation: 39158

Spreadsheet::Read returns strings as octets encoded in Latin1. To make Perl characters, use the Encode module. Read the introduction to the topic of encoding in Perl.

use Encode qw(decode);
use Spreadsheet::Read qw(ReadData);
my $ref = ReadData 'spanish.xls';
my $characters = decode 'Latin-1', $ref->[1]{A1};

Upvotes: 3

Related Questions