Reputation: 813
my $line = "Name:Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE,";
my $name = "";
@name = ( $line =~ m/Name:([\w\s\_\,/g );
foreach (@name) {
print $name."\n";
}
I want to capture the word between Name:
and ,Region
whereever it occurs in the whole line. The main loophole is that the name can be of any format
Amanda_Marry_Rose
Amanda.Marry.Rose
Amanda Marry Rose
Amanda/Marry/Rose
I need a help in capturing such a pattern every time it occurs in the line. So for the line I provided, the output should be
Amanda_Marry_Rose
Raghav.S.Thomas
Does anyone has any idea how to do this? I tried keeping the below line, but it's giving me the wrong output as.
@name=($line=~m/Name:([\w\s\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~\´]+)\,/g);
Output
Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE
Upvotes: 1
Views: 90
Reputation: 126722
The simple way is to look for all sequences of non-comma characters after every instance of Name:
in the string.
use strict;
use warnings;
my $line = 'Name:Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE,';
my @names = $line =~ /Name:([^,]+)/g;
print "$_\n" for @names;
output
Amanda_Marry_Rose
Raghav.S.Thomas
However, it may well be useful to parse the data into an array of hashes so that related fields are gathered together.
use strict;
use warnings;
my $line = 'Name:Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE,';
my %info;
my @persons;
while ( $line =~ / ([a-z]+) : ([^:,]+) /gix ) {
my ($key, $val) = (lc $1, $2);
if ($info{$key}) {
push @persons, { %info };
%info = ();
}
$info{$key} = $val;
}
push @persons, { %info };
use Data::Dump;
dd \@persons;
print "\nNames:\n";
print "$_\n" for map $_->{name}, @persons;
output
[
{
cardtype => "DebitCard",
host => "USE",
name => "Amanda_Marry_Rose",
product => "Satin",
region => "US",
},
{
name => "Raghav.S.Thomas",
region => "UAE",
},
]
Names:
Amanda_Marry_Rose
Raghav.S.Thomas
Upvotes: 0
Reputation: 67900
What you have there is comma separated data. How you should parse this depends a lot on your data. If it is full-fledged csv data, the most safe approach is to use a proper csv parser, such as Text::CSV
. If it is less strict data, you can get away with using the light-weight parser Text::ParseWords
, which also has the benefit of being a core module in Perl 5. If what you have here is rather basic, user entered fields, then I would recommend split
-- simply because when you know the delimiter, it is easier and safer to define it, than everything else inside it.
use strict;
use warnings;
use Data::Dumper;
my $line = "Name:Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE,";
# Simple split
my @fields = split /,/, $line;
print Dumper for map /^Name:(.*)/, @fields;
use Text::ParseWords;
print Dumper map /^Name:(.*)/, quotewords(',', 0, $line);
use Text::CSV;
my $csv = Text::CSV->new({
binary => 1,
});
$csv->parse($line);
print Dumper map /^Name:(.*)/, $csv->fields;
Each of these options give the same output, save for the one that uses Text::CSV
, which also issues an undefined warning, quite correctly, because your data has a trailing comma (meaning an empty field at the end).
Each of these has different strengths and weaknesses. Text::CSV
can choke on data that does not conform with the CSV format, and split
cannot handle embedded commas, such as Name:"Doe, John",...
.
The regex we use to extract the names very simply just captures the entire rest of the lines that begin with Name:
. This also allows you to perform sanity checks on the field names, for example issue a warning if you suddenly find a field called Doe;Name:
Upvotes: 0
Reputation:
Reg-ex corrected:
my $line = "Name:Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE,";
my @name = ($line =~ /Name\:([\w\s_.\/]+)\,/g);
foreach my $name (@name) {
print $name."\n";
}
Upvotes: 0
Reputation: 7912
To capture between Name:
and the first comma, use a negated character class:
/Name:([^,]+)/g
This says to match one or more characters following Name:
which isn't a comma:
while (/Name:([^,]+)/g) {
print $1, "\n";
}
This is more efficient than a non-greedy quantifier, e.g:
/Name:(.+?),/g
As it doesn't require backtracking.
Upvotes: 3