user399517
user399517

Reputation: 3623

how to truncate a string using regular expression in perl

I have the following string in a file and want to truncate the string to no more than 6 char. how to do that using regular expression in perl?
the original file is:

cat shortstring.in:

<value>[email protected]</value>
<value>[email protected]</value>

I want to get file as:
cat shortstring.out

<value>1234@g</value>
<value>1235@g</value>

I have a code as follows, is there any more efficient way than using
s/<value>(\w\w\w\w\w\w)(.*)/$1/;?

Here is a part of my code:

    while (<$input_handle>) {                        # take one input line at a time
            chomp;
            if (/(\[email protected])/) {
                    s/(<value>\w\w\w\w\w\w)(.*)</value>/$1/;
                    print $output_handle "$_\n";
              } else {
              print $output_handle "$_\n";
            }
    }

Upvotes: 0

Views: 5718

Answers (5)

Greg Bacon
Greg Bacon

Reputation: 139531

$ perl -pe 's/(<value>[^<]{1,6})[^<]*/$1/' shortstring.in
<value>1234@g</value>
<value>1235@g</value>

In the context of the snippet from your question, use

while (<$input_handle>) {
  s!(<value>)(.*?)(</value>)!$1 . substr($2,0,6) . $3!e
    if /(\d+\@google\.com)/;
  print $output_handle $_;
}

or to do it with a single pattern

while (<$input_handle>) {
   s!(<value>)(\d+\@google\.com)(</value>)!$1 . substr($2,0,6) . $3!e;
  print $output_handle $_;
}

Using bangs as the delimiters on the substitution operator prevents Leaning Toothpick Syndrome in </value>.

NOTE: The usual warnings about “parsing” XML with regular expressions apply.

Demo program:

#! /usr/bin/perl

use warnings;
use strict;

my $input_handle = \*DATA;
open my $output_handle, ">&=", \*STDOUT or die "$0: open: $!";

while (<$input_handle>) {
   s!(<value>)(\d+\@google\.com)(</value>)!$1 . substr($2,0,6) . $3!e;
  print $output_handle $_;
}

__DATA__
<value>[email protected]</value>
<value>[email protected]</value>
<value>[email protected]</value>

Output:

$ ./prog.pl 
<value>1234@g</value>
<value>1235@g</value>
<value>12@goo</value>

Upvotes: 5

David Blevins
David Blevins

Reputation: 19378

Looks like you want to truncate the text inside the tag which could be shorter than 6 characters already, in which case:

s/(<value>[^<]{1,6})[^<]*/$1/

Upvotes: 1

Hut8
Hut8

Reputation: 6342

Use this instead (regex is not the only feature of Perl and it's overkill for this: :-)

$str = substr($str, 0, 6);

http://perldoc.perl.org/functions/substr.html

Upvotes: 10

Eugene Yarmash
Eugene Yarmash

Reputation: 149823

Try this:

s|(?<=<value>)(.*?)(?=</value>)|substr $1,0,6|e;

Upvotes: 1

Paul Tomblin
Paul Tomblin

Reputation: 182782

s/<value>(.{1,6}).*/<value>$1</value>/;

Upvotes: 0

Related Questions