RAVJI
RAVJI

Reputation: 17

perl regular expression pattern matching

Input as GMF File :

CUSTEVSUMMROW_GPRS GPRS - Nova Subscriber Non-Smartphone Package|3126|GB| | 
CUSTEVSUMMROW_GPRS GPRS - Nova Subscriber Smartphone Package|3126|GB| | 
CUSTEVSUMMROW_GPRS GPRS - Nova Subscriber Non-Smartphone Package -  Charged|3126|GB|7500000|234446

In the perl code, I am using the below to extract the strings from the line

if($line=~m/^(CUSTEVSUMMROW_GPRS|CUSTEVSUMMROW).*?\s(.*?)\|(\d+)\|.*\|(.*?)$/)
{
    $tag=$1;
    $lineTxt=$2;
    $usage = $3;
    $amt = $4;
}

output:

tag :: CUSTEVSUMMROW_GPRS  lineTxt :: GPRS - Nova Subscriber Non-Smartphone Package  usage :: 3126  amt ::
tag :: CUSTEVSUMMROW_GPRS  lineTxt :: GPRS - Nova Subscriber Smartphone Package  usage :: 3126  amt ::
tag :: CUSTEVSUMMROW_GPRS  lineTxt :: GPRS - Nova Subscriber Non-Smartphone Package - Charged usage :: 3126 amt :: 234446

How can I retrieve/print the units used is MB or GB .Can anyone please help me out.

Upvotes: 0

Views: 42

Answers (2)

Sobrique
Sobrique

Reputation: 53508

Given what you have there:

if($line=~m/^(CUSTEVSUMMROW_GPRS|CUSTEVSUMMROW).*?\s(.*?)\|(\d+)\|(.*?)\|(.*?)$/)
{
    $tag=$1;
    $lineTxt=$2;
    $usage = $3;
    $units = $4;
    $amt = $5;
}

But I'd suggest that's not the best way to approach this problem - I'd be thinking using split and processing your first field separately.

Something like this maybe:

#!/usr/bin/env perl
use strict;
use warnings;

use Data::Dumper;

my @fields = qw ( tag lineTxt usage units amt );

while (<DATA>) {
    my ( $first_field, @record )  = split '\|';

    #split the first field on _just_ the first space.
    unshift( @record, $first_field =~ m/^(\w+) (.*)$/ );

    #use a hash slice to put that record into a hash of named keys.
    my %data;
    @data{@fields} = @record;
    print Dumper \%data;

    # can of course, make this an array of hashes quite easily. 
}


__DATA__
CUSTEVSUMMROW_GPRS GPRS - Nova Subscriber Non-Smartphone Package|3126|GB| | 
CUSTEVSUMMROW_GPRS GPRS - Nova Subscriber Smartphone Package|3126|GB| | 
CUSTEVSUMMROW_GPRS GPRS - Nova Subscriber Non-Smartphone Package -  Charged|3126|GB|7500000|234446

This prints each record as:

$VAR1 = {
          'units' => 'GB',
          'tag' => 'CUSTEVSUMMROW_GPRS',
          'amt' => '7500000',
          'usage' => '3126',
          'lineTxt' => 'GPRS - Nova Subscriber Non-Smartphone Package -  Charged'
        };

Upvotes: 1

choroba
choroba

Reputation: 242343

You don't capture the column after \d+. Add parentheses to do so.

.* is greedy, i.e. it matches as much as it can. Add a ? to make it frugal:

if ($line =~ /^(CUSTEVSUMMROW_GPRS|CUSTEVSUMMROW).*?\s(.*?)\|(\d+)\|(.*?)\|/)

You can also rewrite the alternative as

(CUSTEVSUMMROW(?:_GPRS)?)

Upvotes: 3

Related Questions