Svend Bager
Svend Bager

Reputation: 11

Numbering lines in sections of file with repeating pattern

I am trying to use awk to number repeated lines with the pattern scan-hgi-oi.[0-9][0-9][0-9].out4 in a larger file

The closed I have got to success is the following command

awk 'BEGIN{i=0}{if ($1="scan-hgi-oi.[0-9][0-9][0-9].out4") { i=i+1}   printf"%i\n","%f",i,$1}' test2 > test3

This only seems to substitute every line with the number 0

The reason why I would like to use awk and not sed for this problem is that the number of repetitions of the pattern is different in each section.

The file has sections looking as follows:

xxxxxxx\
yyyyyyy\
zzzzzz\
scan-hgi-oi.001.out4 number\
scan-hgi-oi.001.out4 number\
scan-hgi-oi.001.out4 number\
ppppppp
xxxxxx\
yyyyyyy\
zzzzzzz\
scan-hgi-oi.002.out4 number\
scan-hgi-oi.002.out4 number\
scan-hgi-oi.002.out4 number\
scan-hgi-oi.002.out4 number\
ppppppp

I would like to get the result beneath.

xxxxxx\
yyyyyyy\
zzzzzzz\
1 number\
2 number\
3 number\
ppppppp
xxxxxx\
yyyyyyy\
zzzzzzz\
1 number\
2 number\
3 number\
4 number\
ppppppp

Hope you can help.

With kind regards from Svend

Upvotes: 1

Views: 36

Answers (1)

markp-fuso
markp-fuso

Reputation: 35106

Issues with current awk code:

  • $1="scan-hgi-oi.[0-9][0-9][0-9].out4" is an assignment (single =); for a test you need to use a double = ($1=="scan....") (though the code would also need to be modified to deal with the trailing \)
  • if ($1="scan ...") (assignment, not test) always evaluates to 'false' (in this case) so i=i+1 is never executed (ie, i will always be =0)
  • printf "%i\n","%f",i,$1 - the format string ("%i\n") only has one placeholder so of the 3 arguments ("%f", i, $1) only the "%f" is used and since the string "%f" is an invalid integer the %i ends up being replaced with 0 on all invocations ...
  • hence all output lines are 0
  • while fixing the printf call is doable a bit more code is needed to address the conditional printing of the current line vs the replacement line

One idea for reworking the current code:

$ awk '/scan-hgi-oi.[0-9]{3}.out4/ {print ++i,"number\\"; next} {i=0; print}' sample.dat
xxxxxxx\
yyyyyyy\
zzzzzz\
1 number\
2 number\
3 number\
ppppppp
xxxxxx\
yyyyyyy\
zzzzzzz\
1 number\
2 number\
3 number\
4 number\
ppppppp

Upvotes: 1

Related Questions