kbrin80
kbrin80

Reputation: 409

How can I extract the substring in the last set of parentheses using Perl?

I am using Perl to parse out sizes in a string. What is the regex that I could use to accomplish this:

Example Data: Sleepwell Mattress (Twin)
Magic Nite (Flip Free design) Mattress (Full XL)

Result: Twin Full XL

I know that I need to start at the end of the string and parse out the first set of parenthesis just not sure how to do it.

#!/usr/bin/perl

$file = 'input.csv';

open (F, $file) || die ("Could not open $file!");

while ($line = <F>)
{
  ($field1,$field2,$field3,$field4,$field5,$field6,$field7, $field8, $field9) = split ',', $line;
  if ( $field1 =~ /^.*\((.*)\)/ ) {
  print $1;
}


#print "$field1,$field2,$field3,$field4,$field5,$field6,$field7, $field8, $field9, $1\n";
}

close (F);

Not getting any results. Maybe I am not doing this right.

Upvotes: 1

Views: 601

Answers (5)

ghostdog74
ghostdog74

Reputation: 342591

fancy regex is not really necessary here. make it easier on yourself. you can do splitting on "[space](" and get the last element. Of course, this is when the data you want to get is always at the last...and have parenthesis

while(<>){
    @a = split / \(/, $_;
    print $a[-1]; # get the last element. do your own trimming
}

Upvotes: 0

Sinan &#220;n&#252;r
Sinan &#220;n&#252;r

Reputation: 118148

The answer depends on if the size information you are looking for always appears within parentheses at the end of the string. If that is the case, then your task is simple:

#!/usr/bin/perl

use strict;
use warnings;

while (<DATA> ) {
    last unless /\S/;
    my ($size) = /\( ( [^)]+ ) \)$/x;
    print "$size\n";
}

__DATA__
Sleepwell Mattress (Twin)
Magic Nite (Flip Free design) Mattress (Full XL)

Output:

C:\Temp> xxl
Twin
Full XL

Note that the code you posted can be better written as:

#!/usr/bin/perl

use strict;
use warnings;

my ($input_file) = @ARGV;

open my $input, '<', $input_file
    or die "Could not open '$input_file': $!";

while (my $line = <$input>) {
    chomp $line;
    my @fields = split /,/, $line;
    if ($field[0] =~ /\( ( [^)]+ ) \)$/x ) {
        print $1;
    }
    print join('|', @fields), "\n";
}

close $input;

Also, you should consider using Text::xSV or Text::CSV_XS to process CSV files.

Upvotes: 5

Stoo
Stoo

Reputation: 244

The following regular expression will match the content at the end of the string:

m/\(([^)]+)\)$/m

The m at then end matches mutli-line strings and changes the $ to match at the end of the line, not the end of the string.

[edited to add the bit about multi-line strings]

Upvotes: 2

Jez
Jez

Reputation: 30023

This is the answer as expressed in Perl5:

my $str = "Magic Nite (Flip Free design) Mattress (Full XL)";
$str =~ m/.*\((.*)\)/;
print "$1\r\n";

Upvotes: -1

Daren Schwenke
Daren Schwenke

Reputation: 5478

Assuming your data arrives line by line, and you are only interested in the contents of the last set of parens:

if ( $string =~ /^.*\((.*)\)/ ) {
  print $1;
}

Upvotes: 0

Related Questions