Hubbs
Hubbs

Reputation: 173

Bash splitting a multi line string by a multi-character delimiter into an array

I have searched for a similar topic here but most questions included single-character delimiter.

I have this sample of text:

Some text here,
continuing on next lineDELIMITERSecond chunk of text
which may as well continue on next lineDELIMITERFinal chunk

And the desired output is a list (extracted=()) which contains:

  1. Some text here, continuing on next line
  2. Second chunk of text which may as well continue on next line
  3. Final chunk

As could be seen from the sample, "DELIMITER" is used as a splitting delimiter.

I have tried numerous samples on SO incl awk, replacing etc.

Upvotes: 3

Views: 2742

Answers (6)

stack0114106
stack0114106

Reputation: 8711

You can try Perl. With -0777 option, perl slurps the entire file into a $_ variable. You can then split the content using the DELIMITER. Check this out.

$ perl -0777 -ne '@x=split("DELIMITER");print join("\n\n",@x) ' hubbs.txt
Some text here,
continuing on next line

Second chunk of text
which may as well continue on next line

Final chunk

$

Adding array positions while printing

$ perl -0777 -ne '@x=split("DELIMITER"); for(@x) { print ++$i,". $_\n"  } ' hubbs.txt
1. Some text here,
continuing on next line
2. Second chunk of text
which may as well continue on next line
3. Final chunk


$

Upvotes: 0

Bach Lien
Bach Lien

Reputation: 1060

I think the most challenge in the question is to handle space, newline, and DELIMITER correctly, and then put all things in an array. It it was to split file only, then it would be too easy. How about this template:

#!/bin/bash
gencode(){
  echo -e "extracted=(); read -r -d '' item <<-DELIMITER"
  sed 's:DELIMITER:\n&\nextracted+=("$item"); read -r -d "" item <<-&\n:' Input_file;
  echo -e "DELIMITER\n"'extracted+=("$item")'
}
gencode|cat -n                                 # for explaination purpose only
eval "`gencode`"                               # do not remove "eval"
for (( i=0; i < ${#extracted[@]}; i++ )); do   # print results
  echo "$i: ${extracted[i]}"
done

Outputs

     1  extracted=(); read -r -d '' item <<-DELIMITER
     2  Some text here,
     3  continuing on next line
     4  DELIMITER
     5  extracted+=("$item"); read -r -d "" item <<-DELIMITER
     6  Second chunk of text
     7  which may as well continue on next line
     8  DELIMITER
     9  extracted+=("$item"); read -r -d "" item <<-DELIMITER
    10  Final chunk
    11  DELIMITER
    12  extracted+=("$item")
0: Some text here,
continuing on next line
1: Second chunk of text
which may as well continue on next line
2: Final chunk

Upvotes: 0

FrancJnr
FrancJnr

Reputation: 31

You can try using arrays.

#!/bin/bash
str="continuing on next lineDELIMITERSecond chunk of text
which may as well continue on next lineDELIMITERFinal chunk";


delimiter=DELIMITER
s=$str$delimiter

array=();
while [[ $s ]]; do
array+=( "${s%%"$delimiter"*}" );
s=${s#*"$delimiter"};
done;
declare -p array

this will split your text into array based on your delimiter the result will be an array of your text.

array=([0]="continuing on next line" [1]=$'Second chunk of text\nwhich may as well continue on next line' [2]="Final chunk")

you can access each line using the array indices or you can print all the lines using printf '%s\n' "${array[@]}"

the results will be

continuing on next line Second chunk of text which may as well continue on next line Final chunk

The solution gives you an opportunity to do a lot with your text.

Upvotes: 1

tshiono
tshiono

Reputation: 22012

With AWK please try the following:

awk -v RS='^$' -v FS='DELIMITER' '{
    n = split($0, extracted)
    for (i=1; i<=n; i++) {
        print i". "extracted[i]
    }
}' sample.txt

which yields:

1. Some text here,
continuing on next line
2. Second chunk of text
which may as well continue on next line
3. Final chunk

If you require to transfer the awk array to bash array, further step will be needed depending on the succeeding process on the array.

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133428

In case you don't want to change default RS value then could you please try following.

awk '{gsub("DELIMITER",ORS)} 1' Input_file

Upvotes: 5

Romeo Ninov
Romeo Ninov

Reputation: 7215

You can try something like:

awk 'BEGIN {RS="DELIMITER";} {print}' input_file

And then assign it to variable, etc...

Upvotes: 0

Related Questions