user2963286
user2963286

Reputation: 99

Word and line count and byte size of a file in Perl

I'm trying to write a Perl program that takes a file from the command line, counts the number of lines (which is working), shows the size of the file in bytes (which only works when I put the print statement in the while loop, but that prints the size of the line I think, because it's different each loop), the total number of words, and counts the number of occurrences of a particular search word also from the command line.

How can I make it work as expected?

#!/usr/bin/perl

use strict;
use warnings;

my $linecount = 0;
my $wordcount = 0;
my $filesize = 0;
my $search = <>;

while (defined(my $file = <>)) {
    chomp($file);

    my $filesize = length $file;

    if (m/$search/){
        my $wordcount++;
    }

    $linecount = 1;
    $linecount++;
}

print "Size of file in bytes: $filesize\n";
print "Number of line(s): $linecount\n";
print "Number of occurences of $search: $wordcount\n";

Upvotes: 1

Views: 2289

Answers (3)

mpapec
mpapec

Reputation: 50637

I'm using ./perl1.pl testfile.txt hello. Hello being my search word

Reading from <> tells perl to read lines from all files stated on command line (or alternatively lines from STDIN).

Since parameters from cmd line are stored in @ARGV, and hello is not a file in your example, it should be removed and stored elsewhere (my $search = pop @ARGV;)

use strict;
use warnings;

my $linecount = 0;
my $wordcount = 0;
my $search = pop @ARGV;
my ($file) = @ARGV;
my $filesize = -s $file;

while (my $line = <>) {
    chomp($line);

    $wordcount++ while $line =~ /$search/g;

    $linecount++;
}

print "Size of file in bytes: $filesize\n";
print "Number of line(s): $linecount\n";
print "Number of occurrences of $search: $wordcount\n";

Upvotes: 1

TLP
TLP

Reputation: 67900

Quick code review:

use strict;
use warnings;

A very good choice. These pragmas provide information about your code, and help you avoid mistakes.

my $search = <>;

Here you take the first line of the input as the search string. This is probably not what you want. If you are searching through a file, I am guessing the file does not contain the search word in the first line. What you probably are trying to do is access the command line arguments, which are found in @ARGV.

my $search = shift;  

This is the idiomatic way to access @ARGV. It is short for shift @ARGV, which takes the first argument off @ARGV.

while (defined(my $file = <>)) {
    chomp($file);
    my $filesize = length $file;

I get the impression that you think that $file is actually the file name. You said you tried -s $file, which would have worked, if $file had contained the file name. However, the while loop reads from the input file handle <> and assigns the lines of the file to $file. If you want to access the file name, you probably want $ARGV. And you only want to do this once, after the while loop:

my $filesize = -s $ARGV;

Keep in mind that if you use more than one file, $ARGV will change as it refers to the name of the file currently being read with <>. (Technically <ARGV>)

The while loop itself should probably use a different variable name:

while (my $line = <>)

Note that you do not technically need to use defined here.

Also, length returns the number of characters in a string. If you use it on a file name, it returns the number of characters in the file name. It has nothing to do with file size.

if (m/$search/){
    my $wordcount++;
}

This pattern match applies to the default variable $_. What you want is $file =~ m/..../. Also, do you want meta characters to be active in the regex? You might want, for example, to allow plural s with /apples?/. Meta characters can change the meaning of your pattern match, however, so if you just want to match literal strings, use the \Q ... \E escape to disable meta characters.

Another thing, you use my here to declare a new variable (which shadows the previously declared variable). This variable only has the scope of the surrounding if block, so it is quite pointless. Remove my.

Yet another thing is that this match only matches once per line, so you miss out on multiple matches. What you want is probably this:

$wordcount += () = $line =~ /\Q$search\E/g;

Note the use of the global /g modifier which makes the regex match as many times as possible (and not just once). Using () in scalar context returns the number of matches from the pattern match.

$linecount = 1;
$linecount++;

This sets the count to 2. No matter how many lines are in your file, this will never be more than 2. You want to remove the assignment.

Upvotes: 2

Carlos Sanchez
Carlos Sanchez

Reputation: 1016

Do you know how pattern matching works in Perl? Here's what I'd do:

foreach $match ($line =~ /\w+/gi)
{
  chomp($match);

  if($match eq $search)
  {
     $wordcount++;
  }
}

I replaced "$file" with "$line", because it was a little confusing. I guess the chomp isn't really necessary in your case since you've already done it before. The pattern [\w]+ will search for a sequence of 1 or more "word" characters from the line, and store the resulting match in $match. The =~ operator works such that it will continuously move through the $line variable, storing the words it finds into $match. The "g" flag on the match is for global, which means it will search the whole line. The following "i" is for case insensitive searching (you can get rid of this if you want). Then, if the match is the same as our search variable, we increment our wordcount.

Upvotes: 0

Related Questions