sinead
sinead

Reputation: 269

Perl - adding new line and tab characters after a fixed number of characters ina file?

I have a Perl question. I have a file each line of this file contains a different number of As Ts Gs and Cs The file looks like below

ATCGCTGASTGATGCTG
GCCTAGCCCTTAGC
GTTCCATGCCCATAGCCAAATAAA 

I would like to add line number for each line Then insert a \n every 6 characters and then on each of the new rows created put an Empty space every 3 characters

Example of the output should be

Line NO 1                   
ATC GCT
GAS TGA
TGC TG

Line NO 2
GCC TAG
CCC TTA
GC 

I have come up with the code below:

my $count = 0;
     my $line;
     my $row;
     my $split;
     open(F, "Data.txt") or die "Can't read file: $!";
     open (FH, " > UpDatedData.txt") or die "Can't write new file: $!";
     while (my $line = <F>) {
      $count ++ ;
      $row = join ("\n",  ( $line =~ /.{1,6}/gs));
      $split = join ("\t",  ( $row =~ /.{3}/gs ));
      print FH "Line NO\t$count\n$split\n";
    }
    close F;
    close FH;

However

It gives the following out put

Line NO 1                   
ATC GCT
GA  STG A
T   GCT G

Line NO 2
GCC TAG
CC  CTT A
G   C 

This must have something with the \n being counted as a character in this line of code

$split = join ("\t",  ( $row =~ /.{3}/gs ));

Any one got any idea how to get around this problem?

Any help would be greatly appreciated.

Thanks in advance

Sinead

Upvotes: 1

Views: 4019

Answers (3)

Israel Fimbres D.
Israel Fimbres D.

Reputation: 11

This should solve your problem:

use strict;
use warnings;

while (<DATA>) {
  s/(.{3})(.{0,3})?/$1 $2 /g;
  s/(.{7}) /$1\n/g;

  printf "Line NO %d\n%s\n", $., $_;
}

__DATA__
ATCGCTGASTGATGCTG
GCCTAGCCCTTAGC
GTTCCATGCCCATAGCCAAATAAA

Upvotes: 1

TLP
TLP

Reputation: 67908

This is a one-liner:

perl -plwe 's/(.{3})(.{0,3})/$1 $2\n/g' data.txt

The regex looks for 3 characters (does not match newline), followed by 0-3 characters and captures both of those, then inserts a space between them and newline after.

To keep track of the line numbers, you can add

s/^/Line NO $.\n/;

Which will enumerate based on input line number. If you prefer, you can keep a simple counter, such as ++$i.

  • -l option will handle newlines for you.

You can also do it in two stages, like so:

perl -plwe's/.{6}\K/\n/g; s/^.{3}\K/ /gm;'

Using the \K (keep) escape sequence here to keep the matched part of the string, and then simply inserting a newline after 6 characters, and then a space 3 characters after "line beginnings", which with the /m modifier also includes newlines.

So, in short:

perl -plwe 's/.{6}\K/\n/g; s/^.{3}\K/ /gm; s/^/Line NO $.\n/;' data.txt
perl -plwe 's/(.{3})(.{0,3})/$1 $2\n/g;    s/^/Line NO $.\n/;' data.txt

Upvotes: 0

choroba
choroba

Reputation: 241908

Another solution. Note that it uses lexical filehandles and three argument form of open.

#!/usr/bin/perl
use warnings;
use strict;

open my $IN,  '<', 'Data.txt'        or die "Can't read file: $!";
open my $OUT, '>', 'UpDatedData.txt' or die "Can't write new file: $!";
my $count = 0;
while (my $line = <$IN>) {
    chomp $line;
    $line =~ s/(...)(...)/$1 $2\n/g;         # Create pairs of triples
    $line =~ s/(\S\S\S)(\S{1,2})$/$1 $2\n/;  # A triple plus something at the end.
    $line .= "\n" if $line !~ /\n$/;         # A triple or less at the end.
    $count++;
    print $OUT "Line NO\t$count\n$line\n";
}
close $OUT;

Upvotes: 0

Related Questions