Mircea
Mircea

Reputation: 1999

How to split by regex in shell script

I have the following output example:

[OK] AAA.BBBBBB
aaabbbcccdddfffed
asdadadadadadsada
[OK] CCC.KKKKKKK
some text here
[OK] OKO.II

and I want to parse it via a this regex \[OK\](\s+\w+)\.(\w+)\n([^\[]+)

enter image description here

but when I am trying to create my shell script which looks like this:

#!/bin/bash

# Define the text to parse
text="[OK] AAA.BBBBBB
aaabbbcccdddfffed
asdadadadadadsada
[OK] CCC.KKKKKKK
some text here
[OK] OKO.II"

# Create an empty list to hold the group lists
# Loop through the text and extract all matches
regex_pattern="\[OK\](\s+\w+)\.(\w+)\n([^\[]+)"
while [[ $text =~ $regex_pattern ]]; do
  # Create a list to hold the current groups
  echo "Matched_1: ${BASH_REMATCH[1]}"
  echo "Matched_2: ${BASH_REMATCH[2]}"
  echo "Matched_3: ${BASH_REMATCH[3]}"
  echo "-------------------"
done

Is not going to output anything...

Upvotes: 1

Views: 404

Answers (2)

Gilles Quénot
Gilles Quénot

Reputation: 185530

Using PCRE with grep (as explained in comments, bash have no multiline mode):

#!/bin/bash

# Define the text to parse
text="[OK] AAA.BBBBBB
aaabbbcccdddfffed
asdadadadadadsada
[OK] CCC.KKKKKKK
some text here
[OK] OKO.II"

grep -Pzo '(?m)\[OK\](\s+\w+)\.(\w+)\n([^\[]+)' <<< "$text"

Output

[OK] AAA.BBBBBB
aaabbbcccdddfffed
asdadadadadadsada
[OK] CCC.KKKKKKK
some text here

Your regex is yours.

Check regex101 explanations about (?m)


Or with Perl (different output):

perl -0777 -ne 'print $& if m/\[OK\](\s+\w+)\.(\w+)\n([^\[]+)/' <<< "$text"

Output

[OK] AAA.BBBBBB
aaabbbcccdddfffed
asdadadadadadsada

Upvotes: 1

glenn jackman
glenn jackman

Reputation: 247012

Bash does not do global matching.

But what you can do: if there's a match then remove the prefix ending in the matched text from the text string.

text="[OK] AAA.BBBBBB
aaabbbcccdddfffed
asdadadadadadsada
[OK] CCC.KKKKKKK
some text here
[OK] OKO.II"

re=$'\[OK\][[:space:]]+([[:alnum:]_]+)\.([[:alnum:]_]+)([^[]*)'
#                  no newline characters in the regex  ^^^^^^^

while [[ $text =~ $re ]]; do
    # output the match info
    declare -p BASH_REMATCH
    # and remove the matched text from the start of the string
    # (don't forget the quotes here!)
    text=${text#*"${BASH_REMATCH[0]}"}
done

outputs

declare -a BASH_REMATCH=([0]=$'[OK] AAA.BBBBBB\naaabbbcccdddfffed\nasdadadadadadsada\n' [1]="AAA" [2]="BBBBBB" [3]=$'\naaabbbcccdddfffed\nasdadadadadadsada\n')
declare -a BASH_REMATCH=([0]=$'[OK] CCC.KKKKKKK\nsome text here\n' [1]="CCC" [2]="KKKKKKK" [3]=$'\nsome text here\n')
declare -a BASH_REMATCH=([0]="[OK] OKO.II" [1]="OKO" [2]="II" [3]="")

Clearly, this destroys the $text variable, so make a copy if you need it after the loop.

The regex makes the solution a bit fragile: there cannot be any open brackets in the "following" lines.


Having said all that, this is not what bash is really good for. I'd use awk or perl for this task.

Upvotes: 2

Related Questions