sebastian
sebastian

Reputation: 33

delete every second word from a line using perl?

What would be the best way to delete every second word (or better: string separated from adjacent strings by whitespace) from a line of text in perl?

from this:

  1. Mn Gordon Npmsc Snell Npmsc . Fe

i would like to retrieve only this:

  1. Gordon Snell .

I've tried to write a regular expression modeling this, but I've failed so far. Any suggestions would be appreciated!

Upvotes: 3

Views: 1796

Answers (5)

Zaid
Zaid

Reputation: 37146

If whitespace-preservation is not a concern, the following works as a one-liner:

$ perl -pale '$_ = "@F[ grep { !($_ % 2) } 0..$#F ]"' input.dat

Testing the sample provided:

$ echo "1. Mn Gordon Npmsc Snell Npmsc . Fe" | perl -pale '$_ = "@F[ grep { $_ % 2 } 0..$#F ]"'

Output

1. Gordon Snell .

Upvotes: 1

Mansoor Siddiqui
Mansoor Siddiqui

Reputation: 21693

Not sure if this is the most efficient way, but for simplicity, I would probably just call split on the input string, then iterate over the tokens, printing/saving only every second token:

my $sentence = "Once sdklf upon asdfk a lkdfj time...";
my $i = 0;
map { print "$_ " unless $i++ % 2; } (split /\s+/, $sentence);

The above code prints the following:

Once upon a time... 

Upvotes: 3

Blagovest Buyukliev
Blagovest Buyukliev

Reputation: 43548

Something like that will do it:

my $i = 0;
my $line = "Mn Gordon Npmsc Snell Npmsc . Fe";
my @words = grep { $i++ % 2 } split /\s+/, $line;
print "@words\n"; # "Gordon Snell ."

Basically you split the line into a list of words separated by whitespace, and then filter out every second element to put it in the target array.

Upvotes: 4

hmakholm left over Monica
hmakholm left over Monica

Reputation: 23342

You could do something like s/\s+\S+(\s+|$)/$1/g, but that is not very readable. It would probably be clearer (if a little more verbose) to split on whitespace and explicitly select/print/whatever every second element of the resulting list.

Upvotes: 2

mob
mob

Reputation: 118665

s/(\s*\S+\s+)\S+\s*/$1/g seems to get you there, preserving whitespace at the beginning of the line and after the last undeleted word. It's not clear whether you want to preserve whitespace before or after the deleted word (or whether it matters).

foreach $a (
    "1. Mn Gordon Npmsc Snell Npmsc . Fe",
    "i've tried to write a regular expression modeling this,",
    "but i've failed so far. any suggestions appreciated!"
    ) {

    $_=$a;
    s/(\s*\S+\s+)\S+\s*/$1/g;
    print "$_\n";

}

1. Gordon Snell .
i've to a expression this,
but failed far. suggestions

Upvotes: 1

Related Questions