How to search for a pair of words taken from two lists

Question

I need to write a Perl script to find time and location entities in a French text at the beginning of sentences which XML tags. For instance En été ( "in summer" ).

I have a list of location names in a CSV file and a list of moments (winter, summer, Monday, Tuesday etc.) in a text file. I read the lists in two arrays @topo and @tabplace with one cell per line of the original file.

I want to detect entities by searching the sentences beginning with En, à, le etc. (in, at etc.) with the results stored in @entites. Then I need to separate time entities and place entities : places entities will be stored in @places and time entities will be stored in @times.

My problem is about how to find any entry of @entites followed by any entry of @topowith all the results stocked in @times

I was thinking of something like this but I'm missing some steps:

foreach my $celtime ( @entite ) {
    @times = ( grep(/\b@entites.@tabtime/)
}

For your information this is the full project code:

my @phrases  = ();
my @topo     = ();
my @entite   = ();
my @tabplace = ();
my @tabtime  = ();
my $fichiertexte;
my $celplace;
my $fichiertemps = 'entitemps.txt';
my $fichiertopo  = 'toponymes.csv';
my $lignedic;
my $lignetemps;

print "Quel fichier voulez-vous segmentez ?
";
$fichierstexte = ;
chomp( $fichiertexte );

open( TEXT, ">>:encoding(utf8)", $fichiertexte )
        or die( "Impossible d'ouvrir le fichier : ", $!, "
" );
my @phrases = split( /\./, $lignetexte );  # Chaque phrase sur une ligne
while ( $lignetexte =  ) {
    chomp( $lignetexte );
    push( @phrases, $lignetexte );
}
close( TEXT );

open( TEMPS, ">>:encoding(utf8)", $fichiertemps )
        or die( "Impossible d'ouvrir le fichier : ", $!, "
" );
while ( $lignetemps =  ) {
    chomp( $lignetemps );
    push( @tabtime, $lignetemps );  # @tabtime = tableau des noms de temps
}
close( TEMPS );

open( DICO, "<:encoding(utf8)", $fichiertopo )
        or die( "Impossible d'ouvrir le fichier : ", $!, "
" );
while ( $lignedic =  ) {
    chomp( $lignedic );
    push( @topo, $lignedic );  # @topo = tableau des noms de lieu
}
close( DICO );

foreach my $cellule ( @phrases ) {
    if ( grep( /\b(En|En|A|À|Au|Le|Ce|Du|Au).+/, $cellule ) ) { # Si la cellule commence par l'expression régulière
        push( @entite, $cellule );
    }
}

foreach my $celplace ( @entite ) {

    #$cellieu = $cellieu.@dico
    @places = ( grep( /\b$cellieu/ . @topo );    #places = tableau des entités de lieu
}

foreach my $celtime ( @entite ) {
    @times = ( grep( /\b@entite.@tabtime/ ) );     #times = tableau des entité de temps
}

foreach my $entitetemps ( @times ) {
    $entitelieu = ".$entitetemps.";
}

foreach my $entitelieu ( @places ) {
    $entitelieu = ".$entitetemps.";

close( TEXT );

How to search for a pair of words taken from two lists

Answers (1)

output

Related Questions