Michael
Michael

Reputation: 42110

How to split a text file content by a string?

Suppose I've got a text file that consists of two parts separated by a delimiting string ---:

aa
bbb
---
cccc
dd

I am writing a Bash script to read the file and assign the first part to var part1 and the second part to var part2:

part1= ... # should be aa\nbbb
part2= ... # should be cccc\ndd

How would you suggest write this in Bash?

Upvotes: 1

Views: 631

Answers (4)

Matthias Braun
Matthias Braun

Reputation: 34423

In case your input grows to contain more than two parts, you can use awk to split the parts (awk refers to those parts as "records") and create an array with readarray:

# The record separator is `---` on its own line, either at the beginning of
# the file (anchored with ^) or between records.
# The output record separator is the null character.
# With `NF` (number of fields), awk skips empty records.
# The `readarray` command reads the records into an array, splitting on the
# null character with `-d ''`.
readarray -d '' records < <(
  awk 'BEGIN {RS="(^|\n)---\n"; ORS="\0"} NF' "$input_file"
)

for record in "${records[@]}"; do
    echo "#### begin of record #####"
    printf "%s\n" "$record"
    echo "#### end of record #####"
done

See also this question.

Upvotes: 1

sungtm
sungtm

Reputation: 577

Using csplit:

csplit --elide-empty-files --quiet --prefix=foo_bar file.txt "/---/" "{*}" && sed -i '/---/d' foo_bar*

If version of coreutils >= 8.22, --suppress-matched option can be used and sed processing is not required, like

csplit --suppress-matched --elide-empty-files --quiet --prefix=foo_bar file.txt "/---/" "{*}".

Upvotes: 2

axiac
axiac

Reputation: 72415

A solution using sed:

foo=$(sed '/^---$/q;p' -n file.txt)
bar=$(sed '1,/^---$/b;p' -n file.txt)

The -n command line option tells sed to not print the input lines as it processes them (by default it prints them). sed runs a script for each input line it processes.

The first sed script

/^---$/q;p

contains two commands (separated by ;):

  • /^---$/q - quit when you reach the line matching the regex ^---$ (a line that contains exactly three dashes);
  • p - print the current line.

The second sed script

1,/^---$/b;p

contains two commands:

  • 1,/^---$/b - starting with line 1 until the first line matching the regex ^---$ (a line that contains only ---), branch to the end of the script (i.e. skip the second command);
  • p - print the current line;

Upvotes: 1

hek2mgl
hek2mgl

Reputation: 158250

You can use awk:

foo="$(awk 'NR==1' RS='---\n' ORS='' file.txt)"
bar="$(awk 'NR==2' RS='---\n' ORS='' file.txt)"

This would read the file twice, but handling text files in the shell, i.e. storing their content in variables should generally be limited to small files. Given that your file is small, this shouldn't be a problem.

Note: Depending on your actual task, you may be able to just use awk for the whole thing. Then you don't need to store the content in shell variables, and read the file twice.

Upvotes: 2

Related Questions