Reputation: 1296
I'm trying to find quoted strings in a file. Occasionally, those strings might have special characters including slashed quotes (e.g. \").
Using a zsh command, on macOS Catalina (gnu sed, not bsd; although awk, etc... is fine too), what's the most efficient way for me to cache those values in an array?
Sample Input:
a file that contains...
The "quick" "\(brown)" fox
jumps "over \n\"the $?@%\"" fence
Expected Output:
the array below...
echo -E - ${array[@]}
"quick" "\(brown)" "over \n\"the $?@%\""
EDIT
I'm willing to forgo the efficient part, and just focus on something that will work.
Also I’m not trying to handcuff anyone to awk or sed. The script needs to be able to run on a vanilla macOS system, any commands available there are fine.
EDIT
So here's where I'm currently at...
while read line; do
echo -E - $line | sed 's/\\*(/\\\(/g' | awk -F\" '{print $2}'
done < SampleInput
...which outputs:
quick
over n
At this point, I need two things to be fixed to print the values that I'd be storing in the array:
(1) I need to preserve the special characters.
(2) I need to keep more than just the second field. Thinking I need to count the quotes while ignoring the escaped quote, then print every other field.
From there, loading those printed fields into an array using xargs shouldn't be too hard to figure out.
Had some other similar questions recently, so I think it's possible to preserve the special characters; what will be ugly is skipping every other fields.
Eventually I'll get this, but I would appreciate the help from anyone who knows these commands better.
Thanks in advance.
Upvotes: 2
Views: 185
Reputation: 58430
This might work for you (GNU sed):
sed -E 's/^[^"]*"([^"\]*(\\.[^"\]*)*)" */\1\n/;/^[^\n]*\n/P;D' file > file1
The sed invocation whittles down each line in file
, removing any non-words (strings not surrounded by double quotes) and places a newline after a recognised word. Thus each line of file1
will contain a double quoted word, less its double quotes.
N.B. The regexp ignores any character following a \
Upvotes: 1
Reputation: 5975
Here is an attempt with awk
but it needs more testing, I only tested for the sample input.
> cat test.awk
BEGIN { RS="\"" }
p { printf "%s", $0 }
($0 ~ /\\$/) { if (p) { printf "%s", "\"" }; next }
{ if (p) { p=0 } else { p=1; printf "\n" } }
p
is the printing mode and RS
is the double quote. We do not switch the printing mode if we find an escaping double quote, that means a record ending with backlash.
> cat file
The "quick" "\(brown)" fox
jumps "over \n\"the $?@%\"" fence
> awk -f test.awk file
quick
\(brown)
over \n\"the $?@%\"
Upvotes: 1