Reputation: 817
I have an input file like this:
SomeSection.Foo
OtherSection.Foo
OtherSection.Goo
...and there is another file describing which object(s) belong to each section:
[SomeSection]
Blah
Foo
[OtherSection]
Foo
Goo
The desired output would be:
SomeSection.2 // that's because Foo appears 2nd in SomeSection
OtherSection.1 // that's because Foo appears 1st in OtherSection
OtherSection.2 // that's because Goo appears 2nd in OtherSection
(The numbers and names of sections and objects are variable)
How would you do such a thing in awk?
Thanks in advance, Adrian.
Upvotes: 2
Views: 216
Reputation: 36252
One possibility:
Content of script.awk (with comments):
## When 'FNR == NR', the first input file is in process.
## If line begins with '[', get the section string and reset the position
## of its objects.
FNR == NR && $0 ~ /^\[/ {
object = substr( $0, 2, length($0) - 2 )
pos = 0
next
}
## This section process the objects of each section. It saves them in
## an array. Variable 'pos' increments with each object processed.
FNR == NR {
arr_obj[object, $0] = ++pos
next
}
## This section process second file. It splits line in '.' to find second
## part in the array and prints all.
FNR < NR {
ret = split( $0, obj, /\./ )
if ( ret != 2 ) {
next
}
printf "%s.%d\n", obj[1], arr_obj[ obj[1] SUBSEP obj[2] ]
}
Run the script (important the order of input files, object.txt has sections with objects and input.txt the calls):
awk -f script.awk object.txt input.txt
Result:
SomeSection.2
OtherSection.1
OtherSection.2
EDIT to a question in comments:
I'm not an expert but I will try to explain how I understand it:
SUBSEP
is a character to separate indexes in an array when you want to use different values as key. By default is \034
, although you can modify it like RS
or FS
.
In instruction arr_obj[object, $0] = ++pos
the comma joins all values with the value of SUBSEP
, so in this case would result in:
arr_obj[SomeSection\034Blah] = 1
At the end of the script I access to the index using explicity that variable arr_obj[ obj[1] SUBSEP obj[2]
, but with same meaning that arr_obj[object, $0]
in previous section.
You can also access to each part of this index splitting it with SUBSEP variable, like this:
for (key in arr_obj) { ## Assign 'string\034string' to 'key' variable
split( key, key_parts, SUBSEP ) ## Split 'key' with the content of SUBSEP variable.
...
}
with a result of:
key_parts[1] -> SomeSection
key_parts[2] -> Blah
Upvotes: 3
Reputation: 195029
this awk line should do the job:
awk 'BEGIN{FS="[\\.\\]\\[]"}
NR==FNR{ if(NF>1){ i=1; idx=$2; }else{ s[idx"."$1]=i; i++; } next; }
{ if($0 in s) print $1"."s[$0] } ' f2 input
see test below:
kent$ head input f2
==> input <==
SomeSection.Foo
OtherSection.Foo
OtherSection.Goo
==> f2 <==
[SomeSection]
Blah
Foo
[OtherSection]
Foo
Goo
kent$ awk 'BEGIN{FS="[\\.\\]\\[]"}
NR==FNR{ if(NF>1){ i=1; idx=$2; }else{ s[idx"."$1]=i; i++; } next; }
{ if($0 in s) print $1"."s[$0] } ' f2 input
SomeSection.2
OtherSection.1
OtherSection.2
Upvotes: 2