user3766874
user3766874

Reputation: 813

Pattern match in perl

my $line = "Name:Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE,";
my $name = "";

@name = ( $line =~ m/Name:([\w\s\_\,/g );
foreach (@name) {
   print $name."\n";
}

I want to capture the word between Name: and ,Region whereever it occurs in the whole line. The main loophole is that the name can be of any format

Amanda_Marry_Rose
Amanda.Marry.Rose
Amanda Marry Rose
Amanda/Marry/Rose

I need a help in capturing such a pattern every time it occurs in the line. So for the line I provided, the output should be

Amanda_Marry_Rose
Raghav.S.Thomas

Does anyone has any idea how to do this? I tried keeping the below line, but it's giving me the wrong output as.

@name=($line=~m/Name:([\w\s\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~\´]+)\,/g);

Output

Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE

Upvotes: 1

Views: 90

Answers (4)

Borodin
Borodin

Reputation: 126722

The simple way is to look for all sequences of non-comma characters after every instance of Name: in the string.

use strict;
use warnings;

my $line = 'Name:Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE,';

my @names = $line =~ /Name:([^,]+)/g;

print "$_\n" for @names;

output

Amanda_Marry_Rose
Raghav.S.Thomas

However, it may well be useful to parse the data into an array of hashes so that related fields are gathered together.

use strict;
use warnings;

my $line = 'Name:Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE,';

my %info;
my @persons;
while ( $line =~ / ([a-z]+) : ([^:,]+) /gix ) {

   my ($key, $val) = (lc $1, $2);

   if ($info{$key}) {
     push @persons, { %info };
     %info = ();
   }

   $info{$key} = $val;
}
push @persons, { %info };

use Data::Dump;
dd \@persons;

print "\nNames:\n";
print "$_\n" for map $_->{name}, @persons;

output

[
  {
    cardtype => "DebitCard",
    host     => "USE",
    name     => "Amanda_Marry_Rose",
    product  => "Satin",
    region   => "US",
  },
  {
    name   => "Raghav.S.Thomas",
    region => "UAE",
  },
]

Names:
Amanda_Marry_Rose
Raghav.S.Thomas

Upvotes: 0

TLP
TLP

Reputation: 67900

What you have there is comma separated data. How you should parse this depends a lot on your data. If it is full-fledged csv data, the most safe approach is to use a proper csv parser, such as Text::CSV. If it is less strict data, you can get away with using the light-weight parser Text::ParseWords, which also has the benefit of being a core module in Perl 5. If what you have here is rather basic, user entered fields, then I would recommend split -- simply because when you know the delimiter, it is easier and safer to define it, than everything else inside it.

use strict;
use warnings;
use Data::Dumper;

my $line = "Name:Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE,";

# Simple split
my @fields = split /,/, $line;
print Dumper for map /^Name:(.*)/, @fields;

use Text::ParseWords;
print Dumper map /^Name:(.*)/, quotewords(',', 0, $line);

use Text::CSV;
my $csv = Text::CSV->new({
        binary => 1,
    });
$csv->parse($line);
print Dumper map /^Name:(.*)/, $csv->fields;

Each of these options give the same output, save for the one that uses Text::CSV, which also issues an undefined warning, quite correctly, because your data has a trailing comma (meaning an empty field at the end).

Each of these has different strengths and weaknesses. Text::CSV can choke on data that does not conform with the CSV format, and split cannot handle embedded commas, such as Name:"Doe, John",....

The regex we use to extract the names very simply just captures the entire rest of the lines that begin with Name:. This also allows you to perform sanity checks on the field names, for example issue a warning if you suddenly find a field called Doe;Name:

Upvotes: 0

user3999721
user3999721

Reputation:

Reg-ex corrected:

my $line = "Name:Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE,";

my @name = ($line =~ /Name\:([\w\s_.\/]+)\,/g);
    foreach my $name (@name) {
    print $name."\n";
}

Upvotes: 0

RobEarl
RobEarl

Reputation: 7912

To capture between Name: and the first comma, use a negated character class:

/Name:([^,]+)/g

This says to match one or more characters following Name: which isn't a comma:

while (/Name:([^,]+)/g) {
    print $1, "\n";
}

This is more efficient than a non-greedy quantifier, e.g:

/Name:(.+?),/g

As it doesn't require backtracking.

Upvotes: 3

Related Questions