A S
A S

Reputation: 1235

Awk: skip a line from a paragraph

QUESTION (solutions follow)

Let's say the following script operates over several files and prints out a whole surrounding paragraph if the pattern 'TODO:' is found:

awk -v RS='' '{
    if(/TODO:/) {
        print
        print "\n"
    }
}' *.txt

Is it possible to print out these paragraphs in such a way that the lines from these paragraphs containing the pattern DONE: would get skipped?

If the following data is provided:

Apples
Oranges
Bananas

TODO: A
TODO: B
Lorem ipsum

Ad usu oporteat
TODO: C
DONE: D
TODO: E
Ipsum lorem

Then the output should not contain entry DONE: D, should not contain a paragraph with fruits (since there's no TODO: item there), and contain everything else:

TODO: A
TODO: B
Lorem ipsum

Ad usu oporteat
TODO: C
TODO: E
Ipsum lorem

(Sure, I can pipe | grep -v 'DONE:' but would like to learn a bit about awk here...)

SOLUTIONS and RESULTS:

First, by @EdMorton, a plain and clear improvement to the provided function:

awk -v RS='' -v ORS='' 'FNR==1{td_file=0} {
    if(/TODO:/) {
        if (!td_file) {
            print "\n\n"
            f=FILENAME; sub(".txt", "", f)
            print f "\n"
            td_file=1
        }
        sub(/\n.*DONE:.[^\n]*\n/,"\n")
        print
    }
}' *.txt

time report:

real    0m0.048s
user    0m0.029s
sys     0m0.018s

Second, by @RavinderSingh13, as I understand it and after some clean up:

awk '
# Check, if this a new file being proceeded
# If so, reset td_file marker to False
FNR==1{td_file=0}{
# Check if this file contains 'TODO:' pattern and if it hasn't been proceeded yet
    if(/TODO:/ && !td_file) {
# If so, print out FILENAME
        print "\n" FILENAME
# Set td_file marker to True
# (to mark the file as proceeded, in order not to print out FILENAME twice)
        td_file=1
    }
}
# Check, if this is a new file OR the current line has data (number of fields is not 0)
FNR==1 || !NF{
# If so, and if td_entr marker is True, and if we have something to print (container cont is not empty)
    if (td_entr && cont) {
# Then, print it out
        print cont
    }
# And reset variables
    cont=td_entr=""
}
# Check if the current line starts with 'TODO:'
/TODO:/ {
# If so, set todo marker to 1
    td_entr=1
}
# Also, check if the current line does not contain 'DONE:'
!/DONE:/ {
# If so, check variable cont:
# If it doesn't exist, create it and assign to the current line being proceeded
# If it exists, add the Output Records Separator, ORS, and then append the current line being proceeded
    cont=cont?cont ORS $0:$0
    }
' *.txt

With my testing, time reports this version demands more resources (which isn't exactly surprising if I correctly understand the algorithm):

real    0m0.090s
user    0m0.065s
sys     0m0.022s

Given this comparison (and since the first solution was based exactly on the little script I provided in with my question), I set @EdMorton reply as the answer. Nonetheless, I'm extremely grateful to both participants, thank you (I did learn something today :)!

Upvotes: 2

Views: 109

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133600

EDIT: As OP has added more details into his/her post so adding following solution now.

awk 'prev!=FILENAME{if(found && val){print val};val=found="";prev=FILENAME}!NF{if(val && found){print val};val=found=""} /^TODO/{found=1} !/DONE:/{val=val?val ORS $0:$0} END{if(val && found){print val}}'  *.txt

Explanation: Adding complete explanation of above code here.

awk '
prev!=FILENAME{               ##Checking if variable prev value is NOT equal to FILENAME(which is awk out of the box variable which concatins name of Input_file(s)).
  if(found && val){           ##If new Input_file is being read and variable found and val are NOT NULL then do following.
    print val                 ##Printing variable val here.
  }
  val=found=""                ##Nullifying variables val and found here.
  prev=FILENAME               ##Setting variable prev value to FILENAME(current Input_files name).
}
!NF{                          ##Checking condition if a line DO NOT have any fields or have spaces only then do following.
  if(val && found){           ##Checkig condition if variable val and found are NOT NULL here then do following.
    print val                 ##Printing variable val here.
  }
  val=found=""                ##Nullifying variables val and found here.
}
/^TODO/{                      ##Checking condition if a line starts with TODO then do following.
  found=1                     ##Setting found value as 1 here.
}
!/DONE:/{                     ##Checking if a line does not contains string DONE: then do following.
  val=(val?val ORS $0:$0)     ##Creatig variable val whose value will be keep concatenating its own value.
}
END{                          ##Mentioning END section of this awk program here.
  if(val && found){           ##Checking if variable val and found are NOT NULL then do following.
    print val                 ##Printing variable val here.
  }
}' *.txt                      ##Mentioning all *.txt here.

I am assuming in above that you want to start printing from TODO to till Ipsum string only and in between if a line contains DONE: D it skips that also.



A simple awk would be.

awk '!/DONE: D/' Input_file

Explanation: Here we are checking condition if a line DO NOT contain string DONE: D then print those lines. Now question comes we have not mentioned any action when condition comes TRUE here, so explanation of that is: awk works on method of condition and then action, since no action defined so by default print of current line will happen.

Upvotes: 2

Ed Morton
Ed Morton

Reputation: 203985

$ awk -v RS= -v ORS='\n\n' '/TODO:/{sub(/\nDONE: D\n/,"\n"); print}' file
TODO: A
TODO: B
Lorem ipsum

Ad usu oporteat
TODO: C
TODO: E
Ipsum lorem

Upvotes: 1

Related Questions