geeb.24
geeb.24

Reputation: 557

Remove Line from File Based on Column Value in Perl

I wish to loop through multiple files, and their respective lines in the file. I have done is successfully already. Want I want to do now is remove lines in a file based on a numeric value in one of the columns.

If I have an input such as this:

 XP.sta1    -41.5166    0.0513    0.6842    0.1794    0  CPHI.BHZ   300.2458   -42.2436
 XP.sta2      3.5972    0.0500    0.7699    0.1213    0  E000.BHZ   300.5616     2.5545
 XP.sta3      3.7112    0.0267    0.7813    0.1457    0  E002.BHZ   300.6140     2.6160
 XP.sta4      4.2891    0.0214    0.6870    0.1308    0  E004.BHZ   301.2073     2.6006

where the ninth column is the column I wish to look at. I need to remove that value in column 9 (let's assign it a variable $time), in that if that $time is > 10 or less than -10, remove the entire line. Thus far I have tried this:

unless (($time < -10) || ($time > 10) {     
print OUT2 ($stlat,"  ",$stlon,"  ",$eqlat,"  ",$eqlong,"  ",$eqdepth,"  ",$time,"\n");
}}

However I get the following output:

 XP.sta1    -41.5166    0.0513    0.6842    0.1794    0  CPHI.BHZ   300.2458   2.5545
 XP.sta2      3.5972    0.0500    0.7699    0.1213    0  E000.BHZ   300.5616    2.6160
 XP.sta3      3.7112    0.0267    0.7813    0.1457    0  E002.BHZ   300.6140     2.6006
 XP.sta4      4.2891    0.0214    0.6870    0.1308    0  E004.BHZ   301.2073 

As you can see, the entire line isn't deleted -- just the value that meets the true 'unless' condition, and then the other values move up in the 9th column. How do I delete the entire line, rather than just the ninth column number?

Here's where I wish to edit my script:

open(TABLEC,$File);
    @tablec = <TABLEC>;
    for ($j = 2; $j < $stop; $j++) {
       chomp ($tablec[$j]);
       ($netSta,$delayTime) = (split /\s+/,$tablec[$j])[1,9] ;  
        } 

In this for loop, I'm looping through each file, reading in the lines from 2 to 'stop', and chopming the return character. I set the 9th column to the delay time variable. So I'm looping through each line, but I don't want to print anything yet (that comes later in my script). I would just like to remove that entire line, so that later on in my script when I have to print the lines, the line where the 9th column values is >abs(10) does not exist.

Upvotes: 0

Views: 859

Answers (3)

Borodin
Borodin

Reputation: 126722

I thought your question had been answered, buit here's something that should help you with the contents of your edit

Some points on your code

  • Identifiers for lexical variables should contain only lower-case letters, decimal digits, and underscore. Capital letters are reserved for global variables such as constants and package names

  • You should use lexical file handles with the three-parameter form of open

  • You should always verify that an open succeeded. In the case of a failure your program should die and include the value of $! in the die string to reveal why the operation failed

    Together, those points mean that

    open(TABLEC, $File);
    

    becomes

    open my $tablec_fh, '<', $File or die qq{Unable to open "$File" for input: $!};
    
  • You can chomp an entire array at once with chomp @tablec

  • You should avoid the C-style for loop as it is rarely a good choice. Perl allows you to iterate over a range, and you should make use of that. So

    for ($j = 2; $j < $stop; $j++) { ... }
    

    becomes

    for my $j ( 2 .. $stop-1 ) { ... }
    
  • split /\s+/ should almost always be split ' '. The latter is a special case for the operator, which prevents it from returning an initial empty field if the parameter string has leading spaces. If you call split without any parameters then it defaults to split ' ', $_

Here's a rewrite of your sample code that takes these points into account. I hope it's a better fit than my previous answer

open my $tablec_fh, '<', $File or die qq{Unable to open "$File" for input: $!};
my @tablec = <$tablec_fh>;
chomp @tablec;
close $tablec_fh;

for my $i ( 2 .. $stop-1 ) {
  my $row = $tablec[$i];
  my ($net_sta, $delay_time) = (split ' ', $row)[0,8];
  next unless abs($delay_time) <= 10;

  # Do stuff with $row
} 

Upvotes: 0

fugu
fugu

Reputation: 6578

I'd just skip the line:

use warnings;
use strict; 

while(<DATA>){
    my @split = split;
    next if $split[8] > 10 or $split[8] < -10;
    print "$_\n";
}

 XP.sta2      3.5972    0.0500    0.7699    0.1213    0  E000.BHZ   300.5616     2.5545
 XP.sta3      3.7112    0.0267    0.7813    0.1457    0  E002.BHZ   300.6140     2.6160
 XP.sta4      4.2891    0.0214    0.6870    0.1308    0  E004.BHZ   301.2073     2.6006

Upvotes: 1

Borodin
Borodin

Reputation: 126722

You haven't shown enough of your code to diagnose the problem, but what you ask is very simply done like this

use strict;
use warnings;

while ( <DATA> ) {
  print unless abs((split)[8]) > 10;
}

__DATA__
 XP.sta1    -41.5166    0.0513    0.6842    0.1794    0  CPHI.BHZ   300.2458   -42.2436
 XP.sta2      3.5972    0.0500    0.7699    0.1213    0  E000.BHZ   300.5616     2.5545
 XP.sta3      3.7112    0.0267    0.7813    0.1457    0  E002.BHZ   300.6140     2.6160
 XP.sta4      4.2891    0.0214    0.6870    0.1308    0  E004.BHZ   301.2073     2.6006

output

 XP.sta2      3.5972    0.0500    0.7699    0.1213    0  E000.BHZ   300.5616     2.5545
 XP.sta3      3.7112    0.0267    0.7813    0.1457    0  E002.BHZ   300.6140     2.6160
 XP.sta4      4.2891    0.0214    0.6870    0.1308    0  E004.BHZ   301.2073     2.6006

Upvotes: 0

Related Questions