Reputation: 269
I have a Perl question. I have a file each line of this file contains a different number of As Ts Gs and Cs The file looks like below
ATCGCTGASTGATGCTG
GCCTAGCCCTTAGC
GTTCCATGCCCATAGCCAAATAAA
I would like to add line number for each line Then insert a \n every 6 characters and then on each of the new rows created put an Empty space every 3 characters
Example of the output should be
Line NO 1
ATC GCT
GAS TGA
TGC TG
Line NO 2
GCC TAG
CCC TTA
GC
I have come up with the code below:
my $count = 0;
my $line;
my $row;
my $split;
open(F, "Data.txt") or die "Can't read file: $!";
open (FH, " > UpDatedData.txt") or die "Can't write new file: $!";
while (my $line = <F>) {
$count ++ ;
$row = join ("\n", ( $line =~ /.{1,6}/gs));
$split = join ("\t", ( $row =~ /.{3}/gs ));
print FH "Line NO\t$count\n$split\n";
}
close F;
close FH;
However
It gives the following out put
Line NO 1
ATC GCT
GA STG A
T GCT G
Line NO 2
GCC TAG
CC CTT A
G C
This must have something with the \n being counted as a character in this line of code
$split = join ("\t", ( $row =~ /.{3}/gs ));
Any one got any idea how to get around this problem?
Any help would be greatly appreciated.
Thanks in advance
Sinead
Upvotes: 1
Views: 4019
Reputation: 11
This should solve your problem:
use strict;
use warnings;
while (<DATA>) {
s/(.{3})(.{0,3})?/$1 $2 /g;
s/(.{7}) /$1\n/g;
printf "Line NO %d\n%s\n", $., $_;
}
__DATA__
ATCGCTGASTGATGCTG
GCCTAGCCCTTAGC
GTTCCATGCCCATAGCCAAATAAA
Upvotes: 1
Reputation: 67908
This is a one-liner:
perl -plwe 's/(.{3})(.{0,3})/$1 $2\n/g' data.txt
The regex looks for 3 characters (does not match newline), followed by 0-3 characters and captures both of those, then inserts a space between them and newline after.
To keep track of the line numbers, you can add
s/^/Line NO $.\n/;
Which will enumerate based on input line number. If you prefer, you can keep a simple counter, such as ++$i
.
-l
option will handle newlines for you.You can also do it in two stages, like so:
perl -plwe's/.{6}\K/\n/g; s/^.{3}\K/ /gm;'
Using the \K
(keep) escape sequence here to keep the matched part of the string, and then simply inserting a newline after 6 characters, and then a space 3 characters after "line beginnings", which with the /m
modifier also includes newlines.
So, in short:
perl -plwe 's/.{6}\K/\n/g; s/^.{3}\K/ /gm; s/^/Line NO $.\n/;' data.txt
perl -plwe 's/(.{3})(.{0,3})/$1 $2\n/g; s/^/Line NO $.\n/;' data.txt
Upvotes: 0
Reputation: 241908
Another solution. Note that it uses lexical filehandles and three argument form of open
.
#!/usr/bin/perl
use warnings;
use strict;
open my $IN, '<', 'Data.txt' or die "Can't read file: $!";
open my $OUT, '>', 'UpDatedData.txt' or die "Can't write new file: $!";
my $count = 0;
while (my $line = <$IN>) {
chomp $line;
$line =~ s/(...)(...)/$1 $2\n/g; # Create pairs of triples
$line =~ s/(\S\S\S)(\S{1,2})$/$1 $2\n/; # A triple plus something at the end.
$line .= "\n" if $line !~ /\n$/; # A triple or less at the end.
$count++;
print $OUT "Line NO\t$count\n$line\n";
}
close $OUT;
Upvotes: 0