ZYX Rhythm
ZYX Rhythm

Reputation: 73

Store each occurence found by awk to an array

My previous question was flagged "duplicate" and I was pointed to this and this. The solutions provided on those threads does not solve this at all.

Content of file.txt:

Some line of text 0
Some line of text 1
Some line of text 2
PATTERN1
Some line of text 3
Some line of text 4
Some line of text 5
PATTERN2
Some line of text 6
Some line of text 7
Some line of text 8
PATTERN1
Some line of text 9
Some line of text 10
Some line of text 11
PATTERN2
Some line of text 12
Some line of text 13
Some line of text 14

I need to extract "PATTERN1" and "PATTERN2" + lines in between, and the following command does this perfectly:

awk '/PATTERN1 /,/PATTERN2/' ./file.txt

Output:

PATTERN1
Some line of text 3
Some line of text 4
Some line of text 5
PATTERN2

PATTERN1
Some line of text 9
Some line of text 10
Some line of text 11
PATTERN2

But now I am trying to create a bash script that:

  1. uses awk to find the lines between PATTERN1 and PATTERN2
  2. store each occurrence of PATTERN1 + lines in between + PATTERN2 in an array
  3. does 1 & 2 until the end of file.

To clarify. Means store the following lines inside the quotes:

"PATTERN1
Some line of text 3
Some line of text 4
Some line of text 5
PATTERN2"

to array[0]

and store the following lines inside the quotes:

"PATTERN1
Some line of text 9
Some line of text 10
Some line of text 11
PATTERN2"

to array[1]

and so on..... if there are more occurrence of PATTERN1 and PATTERN2

What I currently have:

#!/bin/bash
var0=`cat ./file.txt`
mapfile -t thearray < <(echo "$var0" | awk '/PATTERN1 /,/PATTERN2/')

The above does not work.
And as much as possible I do not want to use mapfile, because the script might be executed on a system that does not support it.

Based on this link provided:

myvar=$(cat ./file.txt)
myarray=($(echo "$var0" | awk '/PATTERN1 /,/PATTERN2/')) 

But when I do echo ${myarray[1]}

I get a blank response.

And when I do echo ${myarray[0]}

I get:

PATTERN1
Some line of text 3
Some line of text 4
Some line of text 5
PATTERN2

PATTERN1
Some line of text 9
Some line of text 10
Some line of text 11
PATTERN2

What I expect when I do echo ${myarray[0]}

PATTERN1
Some line of text 3
Some line of text 4
Some line of text 5
PATTERN2

What I expect when I do echo ${myarray[1]}

PATTERN1
Some line of text 9
Some line of text 10
Some line of text 11
PATTERN2

Any help will be great.

Upvotes: 1

Views: 926

Answers (3)

Paul Hodges
Paul Hodges

Reputation: 15273

As Charles suggested...

Edited to strip the newline off the and of the block (not every record)

while IFS= read -r -d '' x; do array+=("$x"); done < <(awk '
  /PATTERN1/,/PATTERN2/ { if ( $0 ~ "PATTERN2" ) { x=$0; printf "%s%c",x,0; next }
                          print }' ./file.txt)

I reformatted it. It was getting kinda busy and hard to read.

And to test it -

$: echo "[${array[1]}]"
[PATTERN1
Some line of text 9
Some line of text 10
Some line of text 11
PATTERN2]

As an aside, it seems very odd to me to include the redundant sentinel values in the data elements, so if you want to strip those:

$: while IFS= read -r -d '' x; do array+=("$x"); done < <(
    awk '/PATTERN1/,/PATTERN2/{ if ( $0 ~ "PATTERN1" ) { next }
      if ( $0 ~ "PATTERN2" ) { len--; 
        for (l in ary) { printf "%s%c", ary[l], l<len ? "\n" : 0; } 
        delete ary; len=0; next }
      ary[len++]=$0;
    }' ./file.txt )

$: echo "[${array[1]}]"
[Some line of text 9
Some line of text 10
Some line of text 11]

Upvotes: 2

ZYX Rhythm
ZYX Rhythm

Reputation: 73

Paul's answer does what I want, so I flagged it as the accepted answer. Though his solution produces a blank extra line at the bottom of every stored value in the array, which is ok, it is easy to remove anyway, so I did not mind. But I also posted this same question on another site, and though Paul's answer was good, I found a better solution:

IFS=$'\r' read -d'\r' -a  ARR < <(awk '/PATTERN1/,/PATTERN2/ {if($0 ~ /PATTERN2/) printf $0"\r"; else print}' file.txt)

The above does the job, does not produce a blank extra line, and its a one liner.

echo "${ARR[1]}"
echo "${ARR[0]}"

Output:

PATTERN1
Some line of text 9
Some line of text 10
Some line of text 11
PATTERN2

PATTERN1
Some line of text 3
Some line of text 4
Some line of text 5
PATTERN2

Upvotes: 0

M. Nejat Aydin
M. Nejat Aydin

Reputation: 10123

An implementation in plain bash could be something like that:

#!/bin/bash

beginpat='PATTERN1'
endpat='PATTERN2'

array=()
n=-1
inpatterns=
while read -r; do
    if [[ ! $inpatterns && $REPLY = $beginpat ]]; then
        array[++n]=$REPLY
        inpatterns=1
    elif [[ $inpatterns ]]; then
        array[n]+=$'\n'$REPLY
        if [[ $REPLY = $endpat ]]; then
            inpatterns=
        fi
    fi
done

# Report captured lines
for ((i = 0; i <= n; ++i)); do
    printf "=== array[%d] ===\n%s\n\n" $i "${array[i]}"
done

Run as ./script < file. The use of awk isn't required but the script will work correctly on the awk output as well.

Upvotes: 3

Related Questions