JDY
JDY

Reputation: 167

Deleting a line with a pattern unless another pattern is found?

I have a very messy data file, that can look something like this

========
Line 1
dfa====dsfdas==
Line 2 
df  as TOTAL ============

I would like to delete all the lines with "=" only in them, but keep the line if TOTAL is also in the line.

My code is as follows:

for my $file (glob '*.csv') {
    open my $in, '<', $file;        
    my @lines;
    while (<$in>) {
        next if /===/; #THIS IS THE PROBLEM
        push @lines, $_;
    }   
    close $in;
    open my $out, '>', $file;
    print $out $_ for @lines;
    close $out;
}

I was wondering if there was a way to do this in perl with regular expressions. I was thinking something like letting "TOTAL" be condition 1 and "===" be condition 2. Then, perhaps if both conditions are satisfied, the script leaves the line alone, but if only one or zero are fulfilled, then the line is deleted?

Thanks in advance!

Upvotes: 1

Views: 55

Answers (4)

Sobrique
Sobrique

Reputation: 53498

As a general rule, you should avoid making your regexes more complicated. Compressing too many things into a single regex may seem clever, but it makes it harder to understand and thus debug.

So why not just do a compound condition?

E.g. like this:

#!/usr/bin/env perl
use strict;
use warnings;

my @lines;
while (<DATA>) {
    next if ( m/====/ and not m/TOTAL/ );
    push @lines, $_;
}

print $_ for @lines;

__DATA__
========
Line 1
dfa====dsfdas==
Line 2 
df  as TOTAL ============

Will skip any lines with === in, as long as they don't contain TOTAL. And doesn't need advanced regex features which I assure you will get your maintenance programmers cursing you.

Upvotes: 1

Arunesh Singh
Arunesh Singh

Reputation: 3535

You need \A or ^ to check whether the string starts with = or not.Put anchor in regex like:

next if /^===/;

or if only = is going to exist then:

next if /^=+/;

It will skip all the lines beginning with =.+ is for matching 1 or more occurrences of previous token.

Edit:

Then you should use Negative look behind like

next if /(?<!TOTAL)===/

This will ensure that you === is not preceded by TOTAL.

As any no of character's may occur between TOTAL and ===, I will suggest you to use two regexes to ensure string contains === but it doesn't contain TOTAL like:

next if (($_ =~ /===/) && ($_ !~ /TOTAL/))

Upvotes: 2

Phyreprooph
Phyreprooph

Reputation: 517

You're current regex will pick up anything that contains the string === anywhere in the string.

Hello===      Match
===goodbye    Match
=======       Match
foo======bar  Match
===           Match
=             No Match
Hello==       No Match
=========     Match

If you wanted to ensure it picks up only strings made up of = signs then you would need to anchor to the start and the end of the line and account for any number of = signs. The regex that will work will be as follows:

next if /^=+$/;

Each symbols meaning:

^ The start of the string
= A literal "=" sign
+ One or more of the previous 
$ The end of the string

This will pick up a string of any length from the start of the string to the end of the string made up of only = signs.

Hello===      No Match
===goodbye    No Match
=======       No Match
foo======bar  No Match
===           Match
=             Match
Hello==       No Match
=========     Match

I suggest you read up on perl's regex and what each symbol means it can be a very powerful tool if you know what's going on. http://perldoc.perl.org/perlre.html#Regular-Expressions

EDIT: If you want to skip a line on matching both TOTAL and the = then just put in 2 checks:

next if(/TOTAL/ and /=+/)

This can probably be done with a single line of regex. But why bother making it complicated and less readable?

Upvotes: 0

ramana_k
ramana_k

Reputation: 1933

You can use Negative look behind assertion

next if /(?<!TOTAL)===/

matches === when NOT preceded by TOTAL

Upvotes: 1

Related Questions