Reputation: 43

Using AWK or SED, add line with semicolon in it before regexp

Groups of lines in file:

Types       datadata
Term        datadata
Vendor      datadata
Feature     datadata

Types       datadata
Term        datadata
Feature     datadata

Types       datadata
Feature     datadata

The first group of lines is fine. In the second group, I need to add a line for vendor. For the third group, I need to add two lines for term and vendor. I've searched for similar questions, but can't find the right solution. These are headers for data that follows to the right on the same line, so I can't use just a simple find and replace.

Result wanted:

Types      datadata
Term       datadata
Vendor     datadata
Feature    datadata

Types      datadata
Term       datadata
;
Feature    datadata

Types      datadata
;
;
Feature    datadata

Upvotes: 0

Answers (3)

Ed Morton

Reputation: 204258

$ cat tst.awk
{ tag2val[$1] = $0 }
!NF { prt() }
END { prt() }

function prt(   n,i,tag,tags) {
    n = split("Types Term Vendor Feature",tags)
    for (i=1; i<=n; i++) {
        tag = tags[i]
        print (tag in tag2val ? tag2val[tag] : ";")
    }
    print ""
    delete tag2val
}

$ awk -f tst.awk file
Types       datadata
Term        datadata
Vendor      datadata
Feature     datadata

Types       datadata
Term        datadata
;
Feature     datadata

Types       datadata
;
;
Feature     datadata

Upvotes: 0

Daweo

Reputation: 36680

I would do it following way, let file.txt content be:

Types       datadata
Term        datadata
Vendor      datadata
Feature     datadata

Types       datadata
Term        datadata
Feature     datadata

Types       datadata
Feature     datadata

then

awk 'BEGIN{arr["Types"]=arr["Term"]=arr["Vendor"]=arr["Feature"]=";"}
(NF>=1){arr[$1]=$0}
(NF<1){print arr["Types"];print arr["Term"];print arr["Vendor"];print arr["Feature"];print $0;
arr["Types"]=arr["Term"]=arr["Vendor"]=arr["Feature"]=";"}
END{print arr["Types"];print arr["Term"];print arr["Vendor"];print arr["Feature"]}' file.txt

output

Types       datadata
Term        datadata
Vendor      datadata
Feature     datadata

Types       datadata
Term        datadata
;
Feature     datadata

Types       datadata
;
;
Feature     datadata

Explanation: In BEGIN I create arr with Types, Term, Vendor, Feature all ;. For every line with data I set corresponding arr value to that line. For every line without data I print content arr in required order, that line without data and again set everything in arr to ;. In END I print arr same way as for line without data - this is required if and only if your data last line is not empty.

(tested in gawk 4.2.1)

Upvotes: 0

Jonathan Leffler

Reputation: 754650

I created a file, script.awk, containing this code:

function print_key(key)
{
    if (key in saved)
        print saved[key]
    else
        print ";"
}
function process_data()
{
    if (nsaved > 0)
    {
        print_key("Types")
        print_key("Term")
        print_key("Vendor")
        print_key("Feature")
        print ""
    }
    delete saved
    nsaved = 0
}
NF > 0  { saved[$1] = $0; nsaved++ }
NF == 0 { process_data() }
END     { process_data() }

I ran it on your sample data (contained in a file called data):

$ awk -f script.awk data
Types       datadata
Term        datadata
Vendor      datadata
Feature     datadata

Types       datadata
Term        datadata
;
Feature     datadata

Types       datadata
;
;
Feature     datadata

$

What the script does is:

Define a function that takes one argument, key. If that key is present in the array saved, it is printed; if not, then a semicolon is printed on a line on its own.
Define a function that looks to see if there is any data to be printed, and if so, prints each of the keys "Types", "Term", "Vendor" and "Feature", and then an empty line. It also zaps the saved data, deleting the array saved and setting nsaved back to zero.
When the number of fields is greater than 0, the line is non-empty. The first field is used as a key to the saved array and saves the whole line for later printing.
When the number of fields is 0, the line is empty, and the previous block of data is processed.
At the end of the input, the previous block of data is processed. If there was a blank line at the end of the file, there'll be no data to be processed.

Upvotes: 1

Using AWK or SED, add line with semicolon in it before regexp

Answers (3)

Related Questions