Reputation: 121
Given a text file test.txt
with contents:
hello
someline1
someline2
...
world1
line that shouldn't match
hello
someline1
someline2
...
world2
How can I store both of these multiline matches in separate array indexes?
I'm currently trying to use regex="hello.*world[12]"
Unfortunately I can only use native Bash, so Perl etc is off the table. Thanks
Upvotes: 0
Views: 168
Reputation: 17178
I would use awk
and mapfile
(bash version >= 4.3)
#!/bin/bash
mapfile -d '' arr < <(
awk '/hello/{f=1} f; /world[12]/ && f {f=0; printf "\000"}' test.txt
)
arr=([0]=$'hello\nsomeline1\nsomeline2\n...\nworld1\n' [1]=$'hello\nsomeline1\nsomeline2\n...\nworld2\n')
notes:
awk '/hello/{f=1} f; /world[12]/ && f{f=0; printf "\000"}'
. when encountering hello
, set the flag to true
. for each line, print it if the flag is true
. when encountering world[12]
and the flag is true, set the flag to false and print a null-byte
delimiter
mapfile -d '' arr
split the input into an array in which each element was delimited by a null-byte
(instead of \n
)
version for older bash:
#!/bin/bash
arr=()
while IFS='' read -r -d '' block
do
arr+=( "$block" )
done < <(
awk '/hello/{f=1} f; /world[12]/ && f{f=0; printf "\000"}' test.txt
)
Upvotes: 1
Reputation: 22032
As the regex of bash
does not have such functionality as findall()
function of python
, we need to capture the matched substring one by one in the loop.
Would you please try the following:
#!/bin/bash
str=$(<test.txt)
regex="hello.world[12]"
while [[ $str =~ ($regex)(.*) ]]; do
ary+=( "${BASH_REMATCH[1]}" ) # store the match into an array
str="${BASH_REMATCH[2]}" # remaining substring
done
for i in "${!ary[@]}"; do # see the result
echo "[$i] ${ary[$i]}"
done
Output:
[0] hello
world1
[1] hello
world2
[Edit]
If there exist some lines between "hello" and "world", we need to change the approach as the regex of bash does not support the shortest match. Then how about:
regex1="hello"
regex2="world"
while IFS= read -r line; do
if [[ $line =~ $regex1 ]]; then
str="$line"$'\n'
f=1
elif (( f )); then
str+="$line"$'\n'
if [[ $line =~ $regex2 ]]; then
ary+=("$str")
f=0
fi
fi
done < test.txt
Upvotes: 2