can awk replace fields based on separate specification file?

Question

I have an input file like this:

SomeSection.Foo
OtherSection.Foo
OtherSection.Goo

...and there is another file describing which object(s) belong to each section:

[SomeSection]
Blah
Foo
[OtherSection]
Foo
Goo

The desired output would be:

SomeSection.2   // that's because Foo appears 2nd in SomeSection
OtherSection.1  // that's because Foo appears 1st in OtherSection
OtherSection.2  // that's because Goo appears 2nd in OtherSection

(The numbers and names of sections and objects are variable)

How would you do such a thing in awk?

Thanks in advance, Adrian.

Birei · Accepted Answer

One possibility:

Content of script.awk (with comments):

## When 'FNR == NR', the first input file is in process.                                                                                                                                                                                     
## If line begins with '[', get the section string and reset the position                                                                                                                                                                           
## of its objects.                                                                                                                                                                                                                           
FNR == NR && $0 ~ /^\[/ {                                                                                                                                                                                                                    
        object = substr( $0, 2, length($0) - 2 )                                                                                                                                                                                             
        pos = 0
        next
}

## This section process the objects of each section. It saves them in
## an array. Variable 'pos' increments with each object processed.
FNR == NR {
        arr_obj[object, $0] = ++pos
        next
}

## This section process second file. It splits line in '.' to find second
## part in the array and prints all.
FNR < NR {
        ret = split( $0, obj, /\./ )
        if ( ret != 2 ) {
                next
        }
        printf "%s.%d\n", obj[1], arr_obj[ obj[1] SUBSEP obj[2] ]
}

Run the script (important the order of input files, object.txt has sections with objects and input.txt the calls):

awk -f script.awk object.txt input.txt

Result:

SomeSection.2
OtherSection.1
OtherSection.2

EDIT to a question in comments:

I'm not an expert but I will try to explain how I understand it:

SUBSEP is a character to separate indexes in an array when you want to use different values as key. By default is \034, although you can modify it like RS or FS.

In instruction arr_obj[object, $0] = ++pos the comma joins all values with the value of SUBSEP, so in this case would result in:

arr_obj[SomeSection\034Blah] = 1

At the end of the script I access to the index using explicity that variable arr_obj[ obj[1] SUBSEP obj[2], but with same meaning that arr_obj[object, $0] in previous section.

You can also access to each part of this index splitting it with SUBSEP variable, like this:

for (key in arr_obj) {                     ## Assign 'string\034string' to 'key' variable
    split( key, key_parts, SUBSEP )        ## Split 'key' with the content of SUBSEP variable.
    ...
}

with a result of:

key_parts[1] -> SomeSection
key_parts[2] -> Blah

can awk replace fields based on separate specification file?

Answers (2)

Related Questions