Reputation: 409
I am using Perl to parse out sizes in a string. What is the regex that I could use to accomplish this:
Example Data:
Sleepwell Mattress (Twin)
Magic Nite (Flip Free design) Mattress (Full XL)
Result: Twin Full XL
I know that I need to start at the end of the string and parse out the first set of parenthesis just not sure how to do it.
#!/usr/bin/perl
$file = 'input.csv';
open (F, $file) || die ("Could not open $file!");
while ($line = <F>)
{
($field1,$field2,$field3,$field4,$field5,$field6,$field7, $field8, $field9) = split ',', $line;
if ( $field1 =~ /^.*\((.*)\)/ ) {
print $1;
}
#print "$field1,$field2,$field3,$field4,$field5,$field6,$field7, $field8, $field9, $1\n";
}
close (F);
Not getting any results. Maybe I am not doing this right.
Upvotes: 1
Views: 601
Reputation: 342591
fancy regex is not really necessary here. make it easier on yourself. you can do splitting on "[space](" and get the last element. Of course, this is when the data you want to get is always at the last...and have parenthesis
while(<>){
@a = split / \(/, $_;
print $a[-1]; # get the last element. do your own trimming
}
Upvotes: 0
Reputation: 118148
The answer depends on if the size information you are looking for always appears within parentheses at the end of the string. If that is the case, then your task is simple:
#!/usr/bin/perl
use strict;
use warnings;
while (<DATA> ) {
last unless /\S/;
my ($size) = /\( ( [^)]+ ) \)$/x;
print "$size\n";
}
__DATA__
Sleepwell Mattress (Twin)
Magic Nite (Flip Free design) Mattress (Full XL)
Output:
C:\Temp> xxl Twin Full XL
Note that the code you posted can be better written as:
#!/usr/bin/perl
use strict;
use warnings;
my ($input_file) = @ARGV;
open my $input, '<', $input_file
or die "Could not open '$input_file': $!";
while (my $line = <$input>) {
chomp $line;
my @fields = split /,/, $line;
if ($field[0] =~ /\( ( [^)]+ ) \)$/x ) {
print $1;
}
print join('|', @fields), "\n";
}
close $input;
Also, you should consider using Text::xSV or Text::CSV_XS to process CSV files.
Upvotes: 5
Reputation: 244
The following regular expression will match the content at the end of the string:
m/\(([^)]+)\)$/m
The m at then end matches mutli-line strings and changes the $ to match at the end of the line, not the end of the string.
[edited to add the bit about multi-line strings]
Upvotes: 2
Reputation: 30023
This is the answer as expressed in Perl5:
my $str = "Magic Nite (Flip Free design) Mattress (Full XL)";
$str =~ m/.*\((.*)\)/;
print "$1\r\n";
Upvotes: -1
Reputation: 5478
Assuming your data arrives line by line, and you are only interested in the contents of the last set of parens:
if ( $string =~ /^.*\((.*)\)/ ) {
print $1;
}
Upvotes: 0