onlyf
onlyf

Reputation: 873

awk multiple lines stored into variable

Goodday,

I have a file in the following format :

XXXXXXXXXXXYYYYYYYYAAAAAAAA
XXXXXXXXXXXIIIIIIII22222222
XXXXXXXXXXXOOOOOOOOPPPPPPPP
XXXXXXXXXXXAAAAAAAAKKKKKKKK
YYYYYYYYYYY22222222AAAAAAAA
YYYYYYYYYYY55555555BBBBBBBB
YYYYYYYYYYYGGGGGGGGKKKKKKKK
YYYYYYYYYYYQQQQQQQQ88888888

... and so on. Every 4 lines the first part (X, Y, ...) remains the same, the rest of the line changes. There is no separator between the lines, and the file is quite big.

I would like to find a way to use awk to read 4 lines at a time, store them in 4 variables and/or set the RS to \n and the FS to something, because i would like to do comparisons in specific 4line-blocks.And be able to output all 4 lines on a match

i.e, If substr(17,3) == X output all 4 records you read.

My apologies for not supplying code, but I really have no idea how to do this with awk.

Given a specific number, ie Y=17, the script would be looking that to a given substring of each record. For example :

if (subst(11:2) == 17) then    # This can be a match on any line of a 4 grouping ( ie X... ) 
print (all 4 lines - All X...) - or print a given substring of those lines.

actual example with the sample provided

if (substr($0,21,2) == "PP") { print all 4 lines in memory }

...and it would print :

XXXXXXXXXXXYYYYYYYYAAAAAAAA
XXXXXXXXXXXIIIIIIII22222222
XXXXXXXXXXXOOOOOOOOPPPPPPPP
XXXXXXXXXXXAAAAAAAAKKKKKKKK

Upvotes: 2

Views: 2241

Answers (1)

tripleee
tripleee

Reputation: 189317

The following simple script should hopefully be useful at least as a start.

awk 'substr($0,21,2) == "PP" { p=1 } # remember match
    NR % 4 { a[NR%4] = $0; next }  # collect lines a[1] through a[3]
    # We have read four lines, and are ready to print if there was a match
    p { for (i=1; i<4; ++i) print a[i]; print $0;
        # reset for next iteration
        p=0 }' filename

The first condition is tested on all input lines. If there is a match on any of them, we remember this by setting the flag variable p to 1 (anything non-zero will do, really). The condition could be a regex just as well; /^.{20}PP/ looks for "PP" in the 21st position.

The second condition fires on lines which are not multiples of 4. We simply collect these lines, and (by way of the next statement) skip the remainder of the script. (As you probably know, the % modulo operator calculates the remainder from division; so it goes from 1 to 3 and then cycles 0, 1, 2, ...)

Thus, if we fall through to the third condition, it means we are on a line whose line number is divisible by 4; now, the condition examines the value of p, and if it's nonzero, the action is taken.

(If it's zero, we fall through without printing anything, and the cycle starts over with NR%4 equal to 1.)

Upvotes: 4

Related Questions