rim
rim

Reputation: 39

how to compare two arrays string

I have two files as input, a file containing a list of words StopWordsList.txt, I want to remove from StopWordsList.txt the words that are in StopWordsList.txt, here is my code:

my $FichierResulat = '/home/lenovo/Bureau/MesTravaux/LeskAlgo/OriginalLeskResult';

open( my $FhResultat, '>:utf8', $FichierResulat );

open( my $fh1, "<:utf8", '/home/lenovo/Bureau/MesTravaux/LeskAlgo/DemoLesk/StopWordsList.txt' ) 
        or die "Failed to open file: $!\n"; #file contains stop words

open( my $fh2, "<:utf8", '/home/lenovo/Bureau/MesTravaux/LeskAlgo/text1.txt' ) #file contains text
        or die "Failed to open file: $!\n";

my @tabStopWords = <$fh1>;

my @tab_contexte;
my @words;

while ( <$fh2> ) {
    chomp;
    next if m/^$/;
    my $context = $_;
    @words = split( / /, $_ );
}
#compare: remove from @words the words existing in @tabStopWords
my %temp;

@temp{@tabStopWords} = 0 .. $#tabStopWords;

for my $val ( @words ) {

    if ( exists $temp{$val} ) {
        print "$val est présent dans tab1 à la position $temp{$val}.\n";
    }
    else {
        print "$val n'est pas dans tab1.\n";
        push @tab_sans_SW, $val;
    }
}

foreach my $value ( @tab_sans_SW ) {
    print $FhResultat "$value\n";
}

but in the result file i have all the words existing in @words without removing the word that exist in @tabStopWords.. I hope tha can you help me.

my sotpwords file : ال الآن التي الذي الذين اللاتي اللائي اللتان اللتين

my texte file : ومواصلات بما فيه من بريد ونور ومياه وصناعات وعلوم ومعارف وحينما يركب احدنا قطارا فإنه يركب في نفس الوقت على حرية جاهزة اعدها له آلاف العمال والمخترعين والمهندسين في

Upvotes: 1

Views: 85

Answers (3)

Sagar Motani
Sagar Motani

Reputation: 124

We can get the difference using smart match operator (~~),

my(@words_arr) = ("is","a");
my(@input_arr) = ("This","is","a","example","code");
my (@diff)  = grep { not $_ ~~ @words_arr} @input_arr;

Upvotes: 0

Borodin
Borodin

Reputation: 126722

There are a couple of problems

  • You don't chomp the contents of @tabStopWords, so each entry has a newline at the end

  • You overwrite the contents of @words each time around the while loop with @words = split(/ /, $_) instead of adding to it

This program will do what you want. I have added use autodie to avoid having to check the result of every open, and I have removed a couple of unused variables. Local variable names are better written using just lower-case letters and underscores, especially for readers whose first language isn't English

I've used split on both files to reduce them both to individual words. Because split also removes newline characters there is no need for chomp

use strict;
use warnings 'all';
use autodie;

use constant FICHIER_STOP_WORD => '/home/lenovo/Bureau/MesTravaux/LeskAlgo/DemoLesk/StopWordsList.txt';
use constant FICHIER_TEXTE     => '/home/lenovo/Bureau/MesTravaux/LeskAlgo/text1.txt';
use constant FICHIER_RESULAT   => '/home/lenovo/Bureau/MesTravaux/LeskAlgo/OriginalLeskResult';


my @tab_stop_words = do {
    open my $fh1, "<:utf8", FICHIER_STOP_WORD;
    map { split } <$fh1>;
};

my @words = do {
    open my $fh1, "<:utf8", FICHIER_TEXTE;
    map { split } <$fh1>;
};

my %words = map { $words[$_] => $_ } 0 .. $#words;

open my $fh_resultat, '>:utf8', FICHIER_RESULAT;

for my $word ( @words ) {

    my $position = $words{$word};

    if ( defined $position ) {
        print "$word est présent dans tab1 à la position $position.\n";
    }
    else {
        print "$word n'est pas dans tab1.\n";
        print $fh_resultat "$word\n";
    }
}

Upvotes: 2

Dave Cross
Dave Cross

Reputation: 69224

This problem would be easier to solve if you showed us the format of your two input files. But as you don't, this will be guesswork.

I guess that your file of stopwords contains a single word on each line. In that case, each element in @tabStopWords and, therefore, each key in %temp will have newline at the end of them. This makes it extremely unlikely that any of the words in your source file will match these keys.

You probably want to add:

chomp @tabStopWords;

to your code.

Upvotes: 1

Related Questions