Reputation:

How can I extract a bunch of numbers from a string?

This is the sample test file:

  Barcode:*99899801000689811* 
  JSC4000I accountNumber:10006898Sequence Number:998 Envelopes: 1 
  LCD5010V Using jsl 'CUSOFF' for output page '6'
  Barcode:*99999901000673703* 
  LCD5010V Using jsl 'CUSOFF' for output page '4'
  LCD5005V Using job 'A' for current page '4'

So, in this file, how to search the word Barcode and extract the first five digits of it, simultaneously passing it into an array.

Thanks in advance.

Upvotes: 1

Views: 1207

Answers (4)

Beano
Beano

Reputation: 7841

A pattern match in array context will return the values marked (by '(' and ')') as a list. Combine this with the looping modifier 'g' to keep re-matching, and you can do it all on one line and I like to think very readable.

my $string =<<'HERE';
Barcode:*99899801000689811* 
JSC4000I accountNumber:10006898Sequence Number:998 Envelopes: 1 
LCD5010V Using jsl 'CUSOFF' for output page '6'
Barcode:*99999901000673703* 
LCD5010V Using jsl 'CUSOFF' for output page '4'
LCD5005V Using job 'A' for current page '4'
HERE

my @array = $string =~ m!Barcode:\*([0-9]{5})[0-9]+\*!g;

# or

foreach my $barcode ($string =~ m!Barcode:\*([0-9]{5})[0-9]+\*!g)
{
    # do stuff with $barcode
}

Upvotes: 0

brian d foy
brian d foy

Reputation: 132820

My solution is similar to Manni's, but I recommend using while to read a file line-by-line. You can use substr() like he does, but a regex with anchors and without quantifiers is going to be pretty fast:

my @barcodes;
while( <$fh> )
    {
    next unless m/^Barcode:\*([0-9]{5})/;

    push @barcodes, $1;
    }

Depending on what else I was doing, I might just use a map instead. The map expression is in list context, so the m// operator returns the list of things it matched in any parentheses:

my @barcodes = map { m/^Barcode:\*([0-9]{5})/ } <$fh>;

I suspect any real-life answer would have a bit more code to warn you about lines that start with Barcode: but are missing the number. I have yet to meet a perfect input file :)

The \G anchor picks up the regex matching where you left off with the last match on the same string, in this case right after the colon:

my @barcodes;
while( <$fh> )
    {
    next unless m/^Barcode:/;

    unless( m/\G\*([0-9]{5])/ )
        {
        warn "Barcode is missing number at line $.\n";
        next;
        }

    push @barcodes, $1;
    }

Upvotes: 0

innaM
innaM

Reputation: 47849

Regular expressions are one way to go. However, just to throw something completely different at you, here's how to handle that stuff with index and substr:

my @array;
foreach my $line ( <$file> ) {
    if ( index( $line, 'Barcode:' ) == 0 ) {
        push @array, substr $line, 9, 5;
    }
}

Upvotes: 1

Andrew Hare
Andrew Hare

Reputation: 351536

Try a regular expression, something like this ought to work:

Barcode:\*(\d{5})

Upvotes: 6

Related Questions