Reputation: 155
Here is my file:
heaven
heavenly
heavenns
abc
heavenns
heavennly
According to my code, only heavenns
and heavennly
should be pushed into @myarr
, and they should be in array only one time. How to do that?
my $regx = "heavenn\+";
my $tmp=$regx;
$tmp=~ s/[\\]//g;
$regx=$tmp;
print("\nNow regex:", $regx);
my $file = "myfilename.txt";
my @myarr;
open my $fh, "<", $file;
while ( my $line = <$fh> ) {
if ($line =~ /$regx/){
print $line;
push (@myarr,$line);
}
}
print ("\nMylist:", @myarr); #printing 2 times heavenns and heavennly
Upvotes: 1
Views: 1497
Reputation: 755074
This is Perl, so There's More Than One Way To Do It (TMTOWTDI). Here's one of them:
#!/usr/bin/env perl
use strict;
use warnings;
my $regex = "heavenn+";
my $rx = qr/$regex/;
print "Regex: $regex\n";
my $file = "myfilename.txt";
my %list;
my @myarr;
open my $fh, "<", $file or die "Failed to open $file: $?";
while ( my $line = <$fh> )
{
if ($line =~ $rx)
{
print $line;
$list{$line}++;
}
}
push @myarr, sort keys %list;
print "Mylist: @myarr\n";
Sample output:
Regex: heavenn+
heavenns
heavenns
heavennly
Mylist: heavennly
heavenns
The sort isn't necessary (but it presents the data in a sane order). You could add items to the array when the count in $list{$line}
is 0. You could chomp the input lines to remove the newline. Etc.
What if I want to push only particular words. For example, if my file is, 1. "heavenns hello" 2. "heavenns hi", "3.heavennly good". What to do to print only 'heavenns' and 'heavennly'?
Then you have to arrange to capture the word only. That means refining the regex. Assuming you want heavenn
at the start of the word and don't mind what alphabetic characters come after that, then:
#!/usr/bin/env perl
use strict;
use warnings;
my $regex = '\b(heavenn[A-Za-z]*)\b'; # Single quotes necessary!
my $rx = qr/$regex/;
print "Regex: $regex\n";
my $file = "myfilename.txt";
my %list;
my @myarr;
open my $fh, "<", $file or die "Failed to open $file: $?";
while ( my $line = <$fh> )
{
if ($line =~ $rx)
{
print $line;
$list{$1}++;
}
}
push @myarr, sort keys %list;
print "Mylist: @myarr\n";
Data file:
1. "heavenns hello"
2. "heavenns hi",
"3.heavennly good". What to d
heaven
heavenly
heavenns
abc
heavenns
heavennly
Output:
Regex: \b(heavenn[A-Za-z]*)\b
1. "heavenns hello"
2. "heavenns hi",
"3.heavennly good". What to d
heavenns
heavenns
heavennly
Mylist: heavennly heavenns
Note that the names in the list no longer include newlines.
This version takes a regex from the command line. The script invocation is:
perl script.pl -p 'regex' [file ...]
It will read from standard input if no file is specified on the command line (better than having a fixed input file name — by a large margin). It looks for multiple occurrences of the specified regex on each line, where the regex can be preceded by or followed by (or both) 'word characters' as specified by \w
.
#!/usr/bin/env perl
use strict;
use warnings;
use Getopt::Std;
my %opts;
getopts('p:', \%opts) or die "Usage: $0 [-p 'regex']\n";
my $regex_base = 'heavenn';
#$regex_base = $ARGV[0] if defined $ARGV[0];
$regex_base = $opts{p} if defined $opts{p};
my $regex = '\b(\w*' . ${regex_base} . '\w*)\b';
my $rx = qr/$regex/;
print "Regex: $regex (compiled form: $rx)\n";
my %list;
my @myarr;
while (my $line = <>)
{
while ($line =~ m/$rx/g)
{
print $line;
$list{$1}++;
#$line =~ s///;
}
}
push @myarr, sort keys %list;
print "Matched words: @myarr\n";
Given the input file:
1. "heavenns hello"
2. "heavenns hi",
"3.heavennly good". What to d
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
An unheavenly host. Good heavens! It heaves to like a yacht!
heaven
Is it heavens
heavenly
heavenns
abc
heavenns
heavennly
You can get outputs such as:
$ perl script.pl -p 'e\w*?ly' myfilename.txt
Regex: \b(\w*e\w*?ly\w*)\b (compiled form: (?^:\b(\w*e\w*?ly\w*)\b))
"3.heavennly good". What to d
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
An unheavenly host. Good heavens! It heaves to like a yacht!
heavenly
heavennly
Matched words: equally heavenly heavennly heavennnly heavennnnly unheavenly
$ perl script.pl myfilename.txt
Regex: \b(\w*heavenn\w*)\b (compiled form: (?^:\b(\w*heavenn\w*)\b))
1. "heavenns hello"
2. "heavenns hi",
"3.heavennly good". What to d
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
heavenns
heavenns
heavennly
Matched words: heavennly heavennnly heavennnnly heavenns heavennsy
$
Upvotes: 1
Reputation: 129559
If you want to push only the first occurance of a word, you can add the following in your loop, after the regex:
# Assumes "my %seen;" is declared outside the loop.
next if $seen{$line}++;
More approaches to uniqueness: How do I print unique elements in Perl array?
Upvotes: 0
Reputation: 386706
For a given value in $_
, !$seen{$_}++
is only true the first time it's executed.
my $regx = qr/heavenn/;
my @matches;
my %seen;
while (<>) {
chomp;
push(@mymatches, $_) if /$regx/ && !$seen{$_}++;
}
Upvotes: 1