Anugraha Sinha
Anugraha Sinha

Reputation: 670

Perl CGI script print (utf-8 encoded - japanese html) over http (apache httpd) getting truncated

Environment settings

OS : RHEL 6.6 (kernel 2.6.32) - x86_64

httpd : httpd-2.2.15-39

perl : 5.10.1-136

CGI API :

perl-CGI-3.15-136

perl-CGI-Session-4.35-6

I am using a static html page with Perl-CGI defined variables in the static html. This html is read in through perl, and then passed to a perl CGI script for eval. Note:

While reading the static html, I am using UTF-8 encoding like

open( IN ,"<:encoding(UTF-8)", $file_path )

After reading the status HTML page, the output is passed back to the CGI script through a variable and then pressed in to eval to evaluate the variables.

Finally, the eval(uated) output from CGI is print which can be read through http daemon. In the CGI script I am using

binmode(STDIN, ':encoding(UTF-8)');
binmode(STDOUT, ':encoding(UTF-8)');

The static HTML looks something like this actual_html (static)

When I check the output of print in the CGI script, I see the complete output as desired, like this cgi print data

However on the Browser, the hidden input fields are getting truncated in an unwanted manner. Like this

truncated html

When I checked the wireshark output for the text/html, which is being printed back from the server to the browser, this is also getting truncated. Like this wireshark scan

Are we supposed to use some special encoding for japanese language?? Or it is a double encoding issue. I have tried removing the encoding when I am reading the static html file, however, that also did not help.

Upvotes: 1

Views: 827

Answers (1)

Anugraha Sinha
Anugraha Sinha

Reputation: 670

I kind of found the solution to the problem. In our code we used to add a data in the CGI::Session object, which had the language. In the following form

$session->param( KEY_SESSION_LANG, $code ).

Where $session is a CGI::Session object and KEY_SESSION_LANG is 'language' and $code is something that we get from HTTP_ACCEPT_LANGUAGE. When we see 'en' we used to set it as perl constant 'en', when we got 'ja' we used to set it as perl constant 'ja'.

When we used to form the session object we used to get a session file as (perl version RHEL 5.x (2.6.18-308) & perl-CGI-Session-4.42-2 ) for JA language we used to get the following in the cgi file

cat cgisess_e31c8d21af82b59fd064babc7ca25c01
$D = {'_SESSION_ID' => 'e31c8d21af82b59fd064babc7ca25c01','_SESSION_ETIME' => 6000,'language' => 'ja','permit' => 'yes','_SESSION_REMOTE_ADDR' => '192.168.101.1','_SESSION_CTIME' => 1441090386,'execute' => 'yes','_SESSION_ATIME' => 1441090387,'_SESSION_EXPIRE_LIST' => {}};*a = \undef;;$D

For perl CGI Session in RHEL 6.6 this is coming out to be

cat cgisess_e31c8d21af82b59fd064babc7ca25c01
$D = {'_SESSION_ID' => 'e31c8d21af82b59fd064babc7ca25c01','_SESSION_ETIME' => 6000,'language' => *a,'permit' => 'yes','_SESSION_REMOTE_ADDR' => '192.168.101.1','_SESSION_CTIME' => 1441090386,'execute' => 'yes','_SESSION_ATIME' => 1441090387,'_SESSION_EXPIRE_LIST' => {}};*a = \undef;;$D

language data for ja is becoming *a. The same is also reflected when we use perl dumper for getting in memory data.

I checked the /usr/share/perl5/vendor_perl/CGI/Session.pm and it had following information in it

=head1 A Warning about UTF8

Trying to use UTF8 in a program which uses CGI::Session has lead to problems. See RT#21981 and RT#28516.

In the first case the user tried "use encoding 'utf8';" in the program, and in the second case the user tried "$dbh->do(qq|set names 'utf8'|);".

Until this problem is understood and corrected, users are advised to avoid UTF8 in conjunction with CGI::Session.

For details, see: http://rt.cpan.org/Public/Bug/Display.html?id=28516 (and ...id=21981).

=head1 TRANSLATIONS

This document is also available in Japanese.

Now when I used the perl dumper following things happened. I quote below from my offical analysis presented on our local development portal

  • I think the problem is because of perl-CGI-Session OSS package, please see the analysis below.

Some inputs from the CGI session source code.

 ## Inputs ##
From the file /usr/share/perl5/vendor_perl/CGI/Session.pm 
## Following are the status of CGI session, set internally after modification to any of the parameters ##

sub STATUS_NEW      () { 1 }        # denotes session that's just created
sub STATUS_MODIFIED () { 2 }        # denotes session that needs synchronization
sub STATUS_DELETED  () { 4 }        # denotes session that needs deletion
sub STATUS_EXPIRED  () { 8 }        # denotes session that was expired.

--snip --

I::Session - persistent session data in CGI applications

=head1 SYNOPSIS

    # Object initialization:
    use CGI::Session;
    $session = new CGI::Session();

    $CGISESSID = $session->id();
  1. We are setting the "language" parameter in the session object. (We create a CGI object, set cookie to it, to get sid, and through sid get the session object). For setting up the language parameter we do $session->param( 'language', ); ---> language_value = en(english) or ja(japanese). When we have completed the printing of the HTML page in /opt/packageManager/pm_gui/cgi/status.cgi file, I checked the cgi session object it is as follows
  2. For EN Language

Before executing session flush

[09/10/2015 16:13:41] [23722] <ERROR> status.cgi : 267 :
 $VAR1 = bless( {
                 '_STATUS' => 2,

                 '_DATA' => {
                              '_SESSION_ETIME' => 6000,
                              '_SESSION_ID' => '995d11334f2c39b95b3fdb86cecd9655',
                              'permit' => 'yes',
                              'language' => 'en',

Then after this when I flush the session as $session->flush() and check the session object it is

[09/10/2015 16:13:41] [23722] <ERROR> status.cgi : 270 : 
 $VAR1 = bless( {
                 '_STATUS' => 0,

                 '_DATA' => {
                              '_SESSION_ETIME' => 6000,
                              '_SESSION_ID' => '995d11334f2c39b95b3fdb86cecd9655',
                              'permit' => 'yes',
                              'language' => 'en',

Inference 1: session status changed after doing flush. This is good, and should be done.

  1. For JP Language Before executing session flush

    [09/10/2015 16:14:54] [31910] status.cgi : 267 : $VAR1 = bless( { '_STATUS' => 2,

                 '_DATA' => {
                              '_SESSION_ID' => '1cd1b7860af4c71264f3969fe74e7a44',
                              '_SESSION_ETIME' => 6000,
                              'language' => *a
    

Then after this, when I flush the session as $session->flush() and check the session object it is NOT THERE. SCRIPT CRASHES HERE IT SELF

Inference 2 : Doing flush with language JP, is terminating the session, and that is why the session gets destroyed. And that is why, ending data in response is truncated

Due to the wrong value being set in memory, in session object, and then the implicit flush by CGI session is failing on the disk. Which results in termination of the session object, and in between termination of session, and and data loss of HTML.

I checked the actual code in sessions.pm file and it seems to be coming in from here

sub param {
my ($self, @args) = @_;
--snip--
# USAGE: $s->param($name, $value);
# USAGE: $s->param($name1 => $value1, $name2 => $value2 [,...]);
# DESC:  updates one or more **public** records using simple syntax
if ((@args % 2) == 0) {
    my $modified_cnt = 0;
    ARG_PAIR:
    while (my ($name, $val) = each %args) {
        if ( $name =~ m/^_SESSION_/) {
            carp "param(): attempt to write to private parameter";
            next ARG_PAIR;
        }
        $self->{_DATA}->{ $name } = $val; ----> HERE
        ++$modified_cnt;
    }
    $self->_set_status(STATUS_MODIFIED);
    return $modified_cnt;
}

As a solution, we stopped putting 'ja' value as a perl constant, but now are putting it as a string "ja" and it seems to be working fine now.

Upvotes: 0

Related Questions