user3443814
user3443814

Reputation: 5

Perl : How to print to a file the result of a function?

I have a script in which there are those lines:

$lsa->get_corpus_vocabulary_and_word_counts();

$lsa->generate_document_vectors();

$lsa->display_corpus_vocab();

$lsa->display_doc_vectors();

But it displays in the command line. I want it to be displayed in a file. So I tried this :

open(OUT, '>', "corpus.txt") or die;
print (OUT $lsa->display_corpus_vocab());
close (OUT);

But it does not work... Any help? Thanks!

EDIT:

Yes it comes from the CPAN Module Algorithm::VSM. I have some difficulties to use this module yet.

Here is a link: https://metacpan.org/pod/Algorithm::VSM

The Perl file from which I extracted those fex code lines is "calculate_precision_and_recall_for_LSA".

Thanks for all your answers! I'll try it tomorrow and let you know as soon as possible if I managed to get it working :).

EDIT2:

Here is a complete exemple file :

#!/usr/bin/perl -w

### retrieve_with_LSA.pl

#use lib '../blib/lib', '../blib/arch';

use strict;
use Algorithm::VSM;

my $corpus_dir = "corpus";

my @query = qw/dog cat/;

my $stop_words_file = "stop_words.txt";    # This file will typically include the
                                       # keywords of the programming
                                       # language(s) used in the software.

#     The two databases named below store the corpus vocabulary word
#     frequency histogram, and the doc vectors for the files in the corpus, and
#     the doc vectors in a reduced-dimensionality LSA representation of the
#     corpus, respectively.  After these three databases are created, you
#     can do VSM retrieval directly from the databases by running the
#     script retrieve_with_disk_based_LSA.pl.  Doing retrieval using a
#     pre-stored model of a corpus will, in general, be much faster since
#     you will be spared the bother of having to create the model.
my $corpus_vocab_db = "corpus_vocab_db";
my $doc_vectors_db  = "doc_vectors_db";
my $normalized_doc_vecs_db  = "normalized_doc_vecs_db";

my $lsa = Algorithm::VSM->new(
               corpus_directory         => $corpus_dir,
               corpus_vocab_db          => $corpus_vocab_db,
               doc_vectors_db           => $doc_vectors_db,
               normalized_doc_vecs_db   => $normalized_doc_vecs_db,
#                   use_idf_filter           => 0,
               stop_words_file          => $stop_words_file,
               want_stemming            => 1,        # Default is no stemming
               lsa_svd_threshold        => 0.01,# Used for rejecting singular
                                                # values that are smaller than
                                                # this threshold fraction of
                                                # the largest singular value.
               max_number_retrievals    => 10,
#                  debug                    => 1,
      );

$lsa->get_corpus_vocabulary_and_word_counts();

#   Uncomment the following if you would like to see the corpus vocabulary:
$lsa->display_corpus_vocab();

open (OUT, '>', "document_corpus.txt");
select OUT;
print OUT $lsa->get_corpus_vocabulary_and_word_counts();
print OUT $lsa->display_corpus_vocab();
select STDOUT;
close (OUT);

#    Uncomment the following statement if you would like to see the inverse
#    document frequencies:
#$lsa->display_inverse_document_frequencies();

$lsa->generate_document_vectors();

$lsa->display_doc_vectors();

$lsa->display_normalized_doc_vectors();

$lsa->construct_lsa_model();

my $retrievals = $lsa->retrieve_with_lsa( \@query );

$lsa->display_retrievals( $retrievals );

But it still dosen't work...

Upvotes: 0

Views: 120

Answers (3)

David W.
David W.

Reputation: 107040

I believe you're using Algorithm::VSM. Is that true?

I'm looking at the source code and I can see that the various methods have print statements embedded in them. For example $lsa->display_corpus_vocab prints out the line Displaying corpus vocabulary:, a list of words, and then Size of the corpus vocabulary: $vocab_size.

Fortunately, the print statements are printing out to the default file handle. This means, we can use select as jwodder suggests in their answer.

(If the author of this module wanted to be truly evil, they could have done print STDOUT "..."; which would mean even the select wouldn't work.)

It also shows us why you never ever put print statements in a module. If a module is suppose to report information, it should return this information and not print it out directly, so a user can dispose of the information as they see fit.

However, you need to be careful with select. What if you wanted to print something to go somewhere besides this file? I would recommend that you use select before and after each call like this:

 my $orig_fh = select;    # Original "default" Filehandle (probably STDOUT)
...
select ( OUT );   # Output of display_corpus_vocab to my file
$lhs->display_corpus_vocab;
select ( $orig_fh );     # Reset program back to STDOUT (or original default FH).

Addendum

I edited my first post. I used select, but it still doen't work. Do you know what's wrong? Thanks again. – user3443814

At first, I was getting very concerned about the select statement. When I originally wrote this post, I had a slight queazy feeling that select was possibly name space sensitive, and because the methods were in another package, the select statement wasn't quite going to work. I didn't see any documentation to that effect, so I felt assured that this wasn't the case.

When you posted a comment that said it didn't work, I thought oh no, but, after looking at what you did, I now see maybe the issue was your code. You have this:

open (OUT, '>', "document_corpus.txt");
select OUT;
print OUT $lsa->get_corpus_vocabulary_and_word_counts();
print OUT $lsa->display_corpus_vocab();
select STDOUT;
close (OUT);

You shouldn't be using print OUT at all because the print statement itself is in the methods get_corpus_vocabulary_and_word_count and display_corpus_vocab themselves.

Just try:

open (OUT, '>', "document_corpus.txt");
select OUT;
$lsa->get_corpus_vocabulary_and_word_counts();
$lsa->display_corpus_vocab();
select STDOUT;
close (OUT);

This should work. The method's internal code contains print and printf, and I don't know if printf is subject to the select statement. The select documentation says it affects write and print and doesn't mention printf. However, the printf documentation says it's equivalent to print sprintf.... So, I assume that printf should be subject to the dictates of the select.

If your file you wrote contains anything at all, we know that the select worked. If it doesn't contain all of your information, and some printed out on the terminal anyway, there is a good chance that printf wasn't affected by the select. I haven't tested that possibility yet.

If your output file is completely empty, it could be that select is namespace sensitive.

You could try changing the namespace where we did the select and see if that takes care of matters:

package Algorithm:VSM;
open (OUT, '>', "document_corpus.txt");
select OUT;
package main;
$lsa->get_corpus_vocabulary_and_word_counts();
$lsa->display_corpus_vocab();
package Algorithm:VSM
select STDOUT;
close (OUT);
package main;

The package statements change the namespace. By default, Perl uses the default main namespace. What I'm doing is changing to the Algorithm::VSM namespace when you open your file and do your select. This is the namespace that the Algorithm::VSM package is using. The hope is that if select is affected by the namespace it is in, this will take care of the issue.

Otherwise, you need to peek behind the curtain, so to speak and take a look at the code in your Algorithm::VSM module. You can find where it resides by executing this command:

$ perldoc -l Algorithm::VSM

look at the code of the subroutines get_corpus_vocabulary_and_word_counts and display_corpus_vocab and see if those print statements are simple plain old print statements without file handles, or if the module is forcing the output to STDOUT.

If you can edit the module's code (you need to be a system administrator to be able to do this), you can try adding:

warn "SELECT FH " . select;

inside the methods in the code and see what it prints out.

I hope that someone else will see this post, point out my complete ignorance and explain what's really happening and why I am completely wrong.

Upvotes: 1

jwodder
jwodder

Reputation: 57460

You need to use select to make OUT the "currently selected filehandle" before calling any of the methods:

open(OUT, '>', "corpus.txt") or die;
select OUT; 
$lsa->get_corpus_vocabulary_and_word_counts();
$lsa->generate_document_vectors();    
$lsa->display_corpus_vocab();
$lsa->display_doc_vectors();
select STDOUT;  # This "unselects" OUT.
close OUT;

Upvotes: 6

Hunter McMillen
Hunter McMillen

Reputation: 61515

A few alternatives, in case you didnt want to use select:

1.Use the correct syntax to print to your filehandle

print {OUT} $lsa->display_corpus_vocab();
close(OUT);

2.Redirect the STDOUT of your script to a file

perl myscript.pl > corpus.txt  

Upvotes: 0

Related Questions