Reputation: 2854
I've got a strange edge-case.
I have a long string which contains \n
(newline characters).
So the string looks something like:
text="loremipsum\nDollor sit atmet \n aliquyam erat,
sed diam\naliquyam erat \n sed diam"
I need to split the string into an array, but keep the newline characters uniterpreted, so the array/output looks like:
"loremipsum\n"
"Dollor sit atmet \n"
"aliquyam erat, sed diam\n"
"aliquyam erat \n"
"sed diam"
I couldn't find a way to split the string and preserve the \n
characters.
If I use IFS=$"\n"
the \n
characters are deleted,
but if I use IFS="\n"
it gets split and delets all occurrence of n
.
I tried it like:
IFS=$"\n" read -d '' -a arr <<<"$text"
How can I solve this?
The text
is dynamic and can be very long 3000+ chars,
so creating the array like: declare -a arr=([0]=$'loremipsum\n'...
is not an option.
The \n
characters (0x5c + 0x6e in ascii code) should all be treated the same,
the should not be replaced with an actual newline.
The \n
characters must be preserved,
because the progrann which gets the output looks for these in plaintext.
The \n
characters can be àt every position in a sentence,
also in a word like:
lor\nem
or with spaces: Lorem \n ipsum
So the \n
characters must be at the end of the elements inside the array, like shown above.
The text must only be splitted at \n
not a spaces etc..
Upvotes: 1
Views: 98
Reputation: 84521
You can use process substitution and echo
, e.g.
text="loremipsum\nDollor sit atmet \n aliquyam erat, sed diam\naliquyam erat \n sed diam"
readarray arr < <(echo -e "$text")
You can also use printf
in the process substitution as well, e.g.
< <(printf "$text")
Since the -t
option is not give to readarray
, the '\n'
is included as part of the array element.
Example Use/Output
Adding a declare -p arr
to output the array, you would have:
text="loremipsum\nDollor sit atmet \n aliquyam erat, sed diam\naliquyam erat \n sed diam"
readarray arr < <(echo -e "$text")
declare -p arr
declare -a arr=([0]=$'loremipsum\n' [1]=$'Dollor sit atmet \n' [2]=$' aliquyam erat, sed diam\n' [3]=$'aliquyam erat \n' [4]=$' sed diam\n')
If you want to trim leading whitespace, you can use the brace-expansion ${element#*[[:space:]]}
. Up to you.
Upvotes: 3
Reputation: 33984
My understanding from the sample (input/output) data given:
text
(between erat,
and sed diem
); this is to be removed and assuming there is no (space) after erat,
we need to add a (space), ie, replace the actual newline character with a (space)\
+ n
; we are to break the array after these literals; the literal \
+ n
are to remain in the text that is stored in the arrayOne idea:
text="loremipsum\nDollor sit atmet \n aliquyam erat,
sed diam\naliquyam erat \n sed diam"
# convert actual newline character to a (space)
text=${text//$'\n'/ }
# add an actual newline character after the literal `\` + `n`
text=${text//\n/\n$'\n'}
# print our value, remove leading (space), and load into array
IFS=$'\n' arr=( $(printf "%s\n" "${text}." | sed 's/^ //g') )
# display array
typeset -p arr
declare -a arr=([0]="loremipsum\\n" [1]="Dollor sit atmet \\n" [2]="aliquyam erat, sed diam\\n" [3]="aliquyam erat \\n" [4]="sed diam.")
# loop through array and display individual strings; add double quotes as delimiters for display purposes
for i in "${!arr[@]}"
do
echo "\"${arr[${i}]}\""
done
"loremipsum\n"
"Dollor sit atmet \n"
"aliquyam erat, sed diam\n"
"aliquyam erat \n"
"sed diam."
Upvotes: 2