user2255200
user2255200

Reputation: 11

Removing whitespace and line breaks between delimiters in Perl

I am new to Perl and trying to sort out an issue but did not have success. I am trying to read data from a text file. The code is:

open FH, 'D:\Learning\Test.txt' or die $!;
my @data_line;
while (<FH>)
{
@data_line = split (/\|\~/);
print @data_line;
}

The file content is like this:

101|~John|~This line is
broken and showing 
space in print|~version123|~data|~|~|~
102|~Abrahim|~This is a line to be print|~version1.3|~|~|~|~

And the output is:

101JohnThis line is    
broken and showing
space in printversion123data
102AbrahimThis is a line to be printversion1.3

I just want to show the data in one line between the delimiters like:

101JohnThis line is broken and showing space in printversion123data
102AbrahimThis is a line to be printversion1.3

Please suggest me what should I do. I had tried chomp(@data_line) also, but it did not work. I am using Windows operating system.

I want to insert these "|~" separated values in different fields of a table. I had added : $_ =~ s/\n//g; before @data_line = split (/\|\~/); it printed the details as per my requirement but not inserting data properly in my database table. Please suggest me what should i do ? Thanks in advance.

Upvotes: 1

Views: 1022

Answers (4)

David W.
David W.

Reputation: 107040

A slight rewrite:

use strict;
use warnings;
use feature qw(say);               #See note #1

use autodie;                       #See note #2

use constant FILE => 'D:/Learning/Test.txt';  #See note #3

open my $fh, "<", FILE;            #See note #4
my $desired_output;
while ( my $line = <DATA> ) {      #See note #5
    chomp $line;                   #See note #6
    $line =~ s/\|~//g;
    if ( $desired_output ) {
       if ( $line =~ /^\d+/ ) {
           $desired_output .= "\n$line";
       }
       else {
           $desired_output .= " $line";
       }
    }
    else {                         #See note #7
       $desired_output = $line;
    }
}
close $fh;                         #See note #8
say "$desired_output";

Instead of using split, why not simply remove the field separators entirely with the substitute command? Also note that I save the output as one continuous line. The interior if structure is a bit more complex than I like, but it's pretty easy to follow. If there is no data in $desired_output, I simply set $desired_output equal to my line. Otherwise, I'll check to see if $line begins with a number. If it does, I'll append a \n to $desired_output and then append $line. Otherwise, I append a space and then $line.

Now for my notes. This is more or less written in what is now called the standard Perl style. This includes some good advice (use strict, warnings, etc) and the way modern programs are now laid out. For example, use underscores to separate out words in variable names instead of camel casing them ($desired_output vs. $desiredOutput). A lot of this is covered in Damian Conway's Perl Best Practices. These might not be the way I'd like to do things, but I do them because it's what everyone else is doing. And, it's usually more important to follow a standard than to complain about it. It's about maintenance and readability. You follow the crowd.

  1. Always put these three lines on all of your programs. The first two will catch 90% of your programming errors and the use features qw(say); allows you to use say instead of print. It saves you from having to add a \n at the end, and this can be a bit more important than it sounds right now. Trust me, you would rather use say instead of print when possible.

  2. use autodie handles many situations in Perl when your program should not continue to run. For example, if you can't read in your file, you might as well not continue your program. The nice thing about autodie is that it will stop your program short when you forget to test the return value of your commands.

  3. When something doesn't change, you should make it a constant. This puts all of your unchanging data in one place, and it allows you to define mystery numbers like PI = 3.1416. Unfortunately, constants cannot be easily interpolated into output unless you know the Perl deep dark secret.

  4. When you open a file, use the three parameter form of the open command, and use scalar file handles. You can more easily pass a scalar file handle to a subroutine than you can with the older global handle.

  5. Don't use $_, the automatic variable unless you have to (like in grep or map). It doesn't improve readability or speed up execution. And, it has the propensity of getting you into trouble. It's a global variable in all packages and can be affected without you even knowing it.

  6. I always chomp every time I read in data that could possibly have a new line on the end, even if it might prove handy later. New lines on the end of lines can cause all sorts of consternations with regular expressions. This could have been done inside the while itself: while ( chomp ( my $line = <$fh> ) ), but that doesn't add to readability or speed.

  7. Note my indentations and the way I use parentheses. This is now the preferred standard. It took me several years unlearning the way it's done in Pascal and K&R style C to do it this way. Might as learn it the right way early on.

  8. Always close file handles when you're done with them. It's just good form.

Upvotes: 1

Suic
Suic

Reputation: 2501

this one liner will help you. but it will change your input file

perl -pi -e 's/\|\~//g;s/\n/ /g' test.txt

Upvotes: 0

cartman
cartman

Reputation: 752

open FH, 'D:\Learning\Test.txt' or die $!;
my @data_line;
while (<FH>)
{
chomp;
@data_line = split (/\|\~/);
print @data_line;
}

you can use chomp to erase the '/n' on file.

Upvotes: 0

erickb
erickb

Reputation: 6309

you need to chomp the "it" variable just before you split.

while (<FH>)
{
chomp ($_);
@data_line = split (/\|\~/);
print @data_line;
}

I usually use an explicit variable to make it more readable.

while ( my $line= <FH> )
{
   chomp ($line);
   ...

Upvotes: 0

Related Questions