Gary U.U. Unixuser
Gary U.U. Unixuser

Reputation: 232

Perl: Replace consecutive spaces in this given scenario?

an excerpt of a big binary file ($data) looks like this:

\n1ax943021C               xxx\t2447\t5
\n1ax951605B               yyy\t10400\t6
\n1ax919275  G2L           zzz\t6845\t6

The first 25 characters contain an article number, filled with spaces. How can I convert all spaces between the article numbers and the next column into a \x09 ? Note the one or more spaces between different parts of the article number.

I tried a workaround, but that overwrites the article number with ".{25}xxx»"

$data =~ s/\n.{25}/\n.{25}xxx/g

Anyone able to help?

Thanks so much!

Gary

Upvotes: 1

Views: 123

Answers (4)

krisku
krisku

Reputation: 3991

I interpret the question as there being a 25 character wide field that should have its trailing spaces stripped and then delimited by a tab character before the next field. Spaces within the article number should otherwise be preserved (like "1ax919275 G2L").

The following construct should do the trick:

$data =~ s/^(.{25})/{$t=$1;$t=~s! *$!\t!;$t}/emg;

That matches 25 characters from the beginning of each line in the data, then evaluates an expression for each article number by stripping its trailing spaces and appending a tab character.

Upvotes: 1

TLP
TLP

Reputation: 67900

You can use unpack for fixed width data:

use strict;
use warnings;
use Data::Dumper;

$Data::Dumper::Useqq=1;
print Dumper $_ for map join("\t", unpack("A25A*")), <DATA>;

__DATA__
1ax943021C               xxx    2447    5
1ax951605B               yyy    10400   6
1ax919275  G2L           zzz    6845    6

Output:

$VAR1 = "1ax943021C\txxx\t2447\t5";
$VAR1 = "1ax951605B\tyyy\t10400\t6";
$VAR1 = "1ax919275  G2L\tzzz\t6845\t6";

Note that Data::Dumper's Useqq option prints whitecharacters in their escaped form.

Basically what I do here is take each line, unpack it, using 2 strings of space padded text (which removes all excess space), join those strings back together with tab and print them. Note also that this preserves the space inside the last string.

Upvotes: 2

fugu
fugu

Reputation: 6578

Not sure exactly what you what - this will match the two columns and print them out - with all the original spaces. Let me know the desired output and I will fix it for you...

#!/usr/bin/perl -w
use strict; 

my @file = ('\n1ax943021C               xxx\t2447\t5', '\n1ax951605B               yyy\t10400\t6',
'\n1ax919275  G2L           zzz\t6845\t6');

foreach (@file) {
    my ($match1, $match2) = ($_ =~ /(\\n.{25})(.*)/);
    print "$match1'[insertsomethinghere]'$match2\n";
}

Output:

\n1ax943021C               '[insertsomethinghere]'xxx\t2447\t5
\n1ax951605B               '[insertsomethinghere]'yyy\t10400\t6
\n1ax919275  G2L           '[insertsomethinghere]'zzz\t6845\t6

Upvotes: 0

Toto
Toto

Reputation: 91488

Have a try with:

$data =~ s/ +/\t/g;

Upvotes: 0

Related Questions