Ben Coughlan
Ben Coughlan

Reputation: 565

AWK find if line is newline or #

I have the following, it's ignoring the lines with just # but not those with \n (empty/ just containing newline lines)

Do you know of a way I can hit two birds with one stone? I.E. if the lines don't contain more than 1 char, then delete the line..

function check_duplicates {

awk '
  FNR==1{files[FILENAME]}
         {if((FILENAME, $0) in a) dupsInFile[FILENAME]
          else
            {a[FILENAME, $0]
             dups[$0] = $0 in dups ? (dups[$0] RS FILENAME) : FILENAME
             count[$0]++}}
              {if ($0 ~ /#/) {
                  delete dups[$0]
               }}
 #Print duplicates in more than one file
         END{for(k in dups)
            {if(count[k] > 1)
              {print ("\n\nDuplicate line found: " k) " - In the following file(s)"
                print dups[k] }}
         printf "\n";
      }' $SITEFILES

awk '
NR {
    b[$0]++
   }
       $0 in b {
          if ($0 ~ /#/) {
          delete b[$0]
                        }
                     if (b[$0]>1) {
                     print ("\n\nRepeated line found: "$0) " - In the following file"
                     print FILENAME
                     delete b[$0]
                     } 
   }' $SITEFILES

 }

The expected input is usually as follows.

 #File Path's
 /path/to/file1
 /path/to/file2
 /path/to/file3
 /path/to/file4



 #
 /more/paths/to/file1
 /more/paths/to/file2
 /more/paths/to/file3
 /more/paths/to/file4
 /more/paths/to/file5
 /more/paths/to/file5

In this case, /more/paths/to/file5, occurs twice, and should be flagged as such.

However, there are also many newlines, which I'd rather ignore.

Er, it also has to be awk, I'm doing a tonne of post processing, and don't want to vary from awk for this bit, if that's okay :)

It really seems to be a bit tougher than I would have expected.

Cheers, Ben

Upvotes: 0

Views: 1211

Answers (1)

nu11p01n73R
nu11p01n73R

Reputation: 26667

You can combine both the if into a single regex.

if ($0 ~ /#|\n/) {
    delete dups[$0]
}

OR

To be more specific you can write

if ($0 ~ /^#?$/) {
    delete dups[$0]
}

What it does

  • ^ Matches starting of the line.

  • #? Matches one or zero #

  • $ Matches end of line.

So, ^$ matches empty lines and ^#$ matches lines with only #.

Upvotes: 2

Related Questions