YD8877
YD8877

Reputation: 10790

Regular expression for fetching text

I have a text file which contains names enclosed in single quotes. How do i do a regex to get all the names the text contains ?

- "Lady of Spain" (uncredited)
  Music by 'Tolchard Evans' (qv)
  Lyrics by 'Robert Hargreaves (II)' (qv), 'Stanley Damerell' (qv) and 'Henry B. Tilsley' (qv)
  Performed by 'Jack Haig' (qv) and 'Kenneth Connor' (qv)

Here is what I could come up with.

/(\'(.*)\')*/

However, the period matches only till the newline. so i modified the regex to include

/(\'(.*)\'.*(\n|\r\n)*)*/

But its still not wokring. Please help me figure out why my regex isnt working.

Upvotes: 1

Views: 143

Answers (4)

Fredrik Pihl
Fredrik Pihl

Reputation: 45662

I'd use split instead:

#!/usr/bin/env perl

while (<DATA>) {
    chomp();
    @values = split(/('.*?')/);
    foreach my $val (@values) {
         print "$val\n" if ($val =~ m/^'/)
    }
}

__DATA__
- "Lady of Spain" (uncredited)
  Music by 'Tolchard Evans' (qv)
  Lyrics by 'Robert Hargreaves (II)' (qv), 'Stanley Damerell' (qv) and 'Henry B. Tilsley' (qv)
  Performed by 'Jack Haig' (qv) and 'Kenneth Connor' (qv)

outputs:

'Tolchard Evans'
'Robert Hargreaves (II)'
'Stanley Damerell'
'Henry B. Tilsley'
'Jack Haig'
'Kenneth Connor'

Upvotes: 3

TLP
TLP

Reputation: 67900

You do not need to match newline with those lines of input. I think your problem lies not so much with the regex, as with how you process your data. As long as your single quoted strings do not contain a newline, you do not need to compensate for that.

Try this one-liner, for example:

perl -nwE '$,="\n"; say /\'([^']+)\'/g;' quotes.txt

As you can see, I use the global option /g to get all the matches from each line.

Further explanations:

  • -n : assume a while (<>) loop around the program (to get input from the file)
  • -E : one-line program, with all optional features enabled (i.e. say)
  • $, : set the OUTPUT_FIELD_SEPARATOR to newline, so that all matches are separated by newline.

If you have the whole text file in a string, try this:

my @matches = $string =~ /'([^']+)'/g;

Upvotes: 1

Xtroce
Xtroce

Reputation: 1824

you can use this:

open FILE, "myfile" or die "Couldn't open file: $!";
#read file to sting
while (<FILE>){
    $string .= $_;
}
close FILE;

#match regex with right order and put to array
while ($string =~ m/'(.*?)'/g) {
     $hash{$1} = ++$i unless $hash{$1};
}
@array = sort {$hash{$a} <=> $hash{$b}} keys %hash;

#print array
foreach (@array) {
    print $_ . "\n";
}

Upvotes: 0

Toto
Toto

Reputation: 91415

Use non greedy quantifier :

/'(.*?)'/

or

/'([^']*)'/

Upvotes: 0

Related Questions