Reputation: 13
I have a file with the following random structures:
USMS 1362224754632|<REQ MSISDN="00966590832186" CONTRACT="580" SUBSCRIPTION="AAA" FORMAT="ascii" TEXT="L2"
or
USMS 1362224754632|<REQ MSISDN="00966590832186" CONTRACT="580" SUBSCRIPTION="BBB" THRESHOLDID="1" FORMAT="ascii" TEXT="L2"
I am trying to parse it with perl to get the values like the following:
1362224754632;00966590832186;580;AAA;L2
Below is the code:
if($Record =~ /USMS (.*?)|<REQ MSISDN="(.*?)" CONTRACT="(.*?)" SUBSCRIPTION="(.*?)" FORMAT="(.*?)" THRESHOLDID="(.*?)" TEXT="(.*?)"/)
{
print LOGFILE "$1;$2;$3;$4;$5;$6;$7\n";
}
elsif($Record =~ /USMS (.*?)|<REQ MSISDN="(.*?)" CONTRACT="(.*?)" SUBSCRIPTION="(.*?)" FORMAT="(.*?)" TEXT="(.*?)"/)
{
print LOGFILE "$1;$2;$3;$4;$5;$6\n";
}
But I am getting always:
;;;;;
Upvotes: 1
Views: 239
Reputation: 126722
It looks like all you want is the fields contained in double-quotes.
That looks like this
use strict;
use warnings;
while (<DATA>) {
my @values = /"([^"]+)"/g;
print join(';', @values), "\n";
}
__DATA__
USMS 1362224754632|<REQ MSISDN="00966590832186" CONTRACT="580" SUBSCRIPTION="AAA" FORMAT="ascii" TEXT="L2"
USMS 1362224754632|<REQ MSISDN="00966590832186" CONTRACT="580" SUBSCRIPTION="BBB" THRESHOLDID="1" FORMAT="ascii" TEXT="L2"
output
00966590832186;580;AAA;ascii;L2
00966590832186;580;BBB;1;ascii;L2
Upvotes: 0
Reputation: 164809
Instead of using a single regex, I would split the data into its separate sections first, then approach them separately.
my($usms_part, $request) = split / \s* \|<REQ \s* /x, $Record;
my($usms_id) = $usms_part =~ /^USMS (\d+)$/;
my %request;
while( $request =~ /(\w+)="(.*?)"/g ) {
$request{$1} = $2;
}
Rather than having to hard code all the possible key/value pairs, and their possible orderings, you can parse them generically in one piece of code.
Upvotes: 3
Reputation: 36262
Pipe (|
) is a special character in regular expressions. Escape it, like: \|
and it will work.
if($Record =~ /USMS (.*?)\|<REQ MSISDN="(.*?)" CONTRACT="(.*?)" SUBSCRIPTION="(.*?)" FORMAT="(.*?)" THRESHOLDID="(.*?)" TEXT="(.*?)"/)
and the same for the else
branch.
Upvotes: 3