Reputation: 51
I need a Perl script to concatenation the line..
I have more than 1000 gene name (>pmpI) and its function (polymorphic outer membrane protein) its is in separate line and i wish to join the function of the gene near to the gene name, so that it can be easy in future to visualize and save it for further reference.
Eg: Files content looks like this
>pmpG
polymorphic outer membrane protein
>pmpH
polymorphic outer membrane protein
>CTA_0953
hypothetical protein
>pmpI
polymorphic outer membrane protein
I tried to do manually in excel manually, but its not possible for many files, so i thought to get help from programmer..
I need Perl script to concatenation the lines
Program out put should be like this:
>pmpG polymorphic outer membrane protein
>pmpH polymorphic outer membrane protein
>CTA_0953 hypothetical protein
>pmpI polymorphic outer membrane protein
Upvotes: 1
Views: 88
Reputation: 69264
With some explanatory comments...
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
# Store the current line
my $line;
while (<DATA>) {
# Remove the newline
chomp;
# If the line starts with '>'
if (/^>/) {
# Output the current $line
# (if we have one)
say $line if $line;
# Set $line to this line
$line = $_;
} else {
# Append this line to $line
$line .= "\t$_";
}
}
# Output the current line
say $line;
__DATA__
>pmpG
polymorphic outer membrane protein
>pmpH
polymorphic outer membrane protein
>CTA_0953
hypothetical protein
>pmpI
polymorphic outer membrane protein
Upvotes: 0
Reputation: 3029
As a single-line command, this would be
perl -n -e 's/^\s+//; s/\s+$//; next unless $_ ne ""; if (/^[>]/) { $n = $_; } else { printf "%-11s%s\n", $n, $_; }' < data.txt
For clarification, when put in a perl program, it would look like:
#!/usr/bin/perl
while (<>) { # iterate over all lines
s/^\s+//; # remove whitespace at the beginning...
s/\s+$//; # ...and the end of the line
next unless $_ ne ""; # ignore empty lines
if (/^[>]/) { $n = $_; } # if line starts with >, remember it
else { printf "%-11s%s\n", $n, $_; # otherwise output the remembered
} # content and the current line
This accepts your content as input, so it would be called with perl program.pl < data.txt
.
The content is expected to be contained in data.txt
; modify this to your actual filename.
Upvotes: 3