Mercutio
Mercutio

Reputation: 1296

zsh - caching quoted strings in an array, efficiently

I'm trying to find quoted strings in a file. Occasionally, those strings might have special characters including slashed quotes (e.g. \").

Using a zsh command, on macOS Catalina (gnu sed, not bsd; although awk, etc... is fine too), what's the most efficient way for me to cache those values in an array?

Sample Input:

a file that contains...

The "quick" "\(brown)" fox
jumps "over \n\"the $?@%\"" fence

Expected Output:

the array below...

echo -E - ${array[@]}
"quick" "\(brown)" "over \n\"the $?@%\""

EDIT

I'm willing to forgo the efficient part, and just focus on something that will work.

Also I’m not trying to handcuff anyone to awk or sed. The script needs to be able to run on a vanilla macOS system, any commands available there are fine.

EDIT

So here's where I'm currently at...

while read line; do 
    echo -E - $line | sed 's/\\*(/\\\(/g' | awk -F\" '{print $2}'
done < SampleInput 

...which outputs:

quick
over n

At this point, I need two things to be fixed to print the values that I'd be storing in the array:

(1) I need to preserve the special characters.

(2) I need to keep more than just the second field. Thinking I need to count the quotes while ignoring the escaped quote, then print every other field.

From there, loading those printed fields into an array using xargs shouldn't be too hard to figure out.

Had some other similar questions recently, so I think it's possible to preserve the special characters; what will be ugly is skipping every other fields.

Eventually I'll get this, but I would appreciate the help from anyone who knows these commands better.

Thanks in advance.

Upvotes: 2

Views: 185

Answers (2)

potong
potong

Reputation: 58430

This might work for you (GNU sed):

sed -E 's/^[^"]*"([^"\]*(\\.[^"\]*)*)" */\1\n/;/^[^\n]*\n/P;D' file > file1

The sed invocation whittles down each line in file, removing any non-words (strings not surrounded by double quotes) and places a newline after a recognised word. Thus each line of file1 will contain a double quoted word, less its double quotes.

N.B. The regexp ignores any character following a \

Upvotes: 1

thanasisp
thanasisp

Reputation: 5975

Here is an attempt with awk but it needs more testing, I only tested for the sample input.

> cat test.awk

BEGIN { RS="\"" }
p { printf "%s", $0 }
($0 ~ /\\$/) { if (p) { printf "%s", "\"" }; next }
{ if (p) { p=0 } else { p=1; printf "\n" } }

p is the printing mode and RS is the double quote. We do not switch the printing mode if we find an escaping double quote, that means a record ending with backlash.

> cat file
The "quick" "\(brown)" fox
jumps "over \n\"the $?@%\"" fence
> awk -f test.awk file

quick
\(brown)
over \n\"the $?@%\"

Upvotes: 1

Related Questions