Reputation: 1125
I have a single string that is this kind of format:
"Mike H<[email protected]>" [email protected] "Mike H<[email protected]>"
If I was writing a normal regex in JS, C#, etc, I'd do this
(?:"(.+?)"|'(.+?)'|(\S+))
And iterate the match groups to grab each string, ideally without the quotes. I ultimately want to add each value to an array, so in the example, I'd end up with 3 items in an array as follows:
Mike H<[email protected]>
[email protected]
Mike H<[email protected]>
I can't figure out how to replicate this functionality with grep
or sed
or bash regex's. I've tried some things like
echo "$email" | grep -oP "\"\K(.+?)(?=\")|'\K(.+?)(?=')|(\S+)"
The problem with this is that while it kind of mimics the functionality of capture groups, it doesn't really work with multiples, so I get captures like
"Mike
H<[email protected]>"
[email protected]
If I remove the look ahead/behind logic, I at least get the 3 strings, but the first and last are still wrapped in quotes. In that approach, I pipe the output to read
so I can individually add each string to the array, but I'm open to other options.
EDIT:
I think my input example may have been confusing, it's just a possible input. The real input could be double quoted, single quoted, or non-quoted (without spaces) strings in any order with any quantity. The Javascript/C# regex I provided is the real behavior I'm trying to achieve.
Upvotes: 6
Views: 18006
Reputation: 5347
Your first expression is fine; just be careful with the quotes (use single quotes when \
are present). In the end trim the "
with sed.
$ echo $mail | grep -Po '".*?"|\S+' | sed -r 's/"$|^"//g'
Mike H<[email protected]>
[email protected]
Mike H<[email protected]>
Upvotes: 1
Reputation: 1125
What I was able to do that worked, but wasn't as concise as I wanted the code to be:
arr=()
while read line; do
line="${line//\"/}"
arr+=("${line//\'/}")
done < <(echo $email | grep -oP "\"(.+?)\"|'(.+?)'|(\S+)")
This gave me an array of the capturing group and handled the input in any order, wrapped in double or single quotes or none at all if it didn't have a space. It also provided the elements in the array without the wrapping quotes. Appreciate all of the suggestions.
Upvotes: 0
Reputation: 37404
Using GNU awk and FPAT
to define fields by content:
$ awk '
BEGIN { FPAT="([^ ]*)|(\"[^\"]*\")" } # define a field to be space-separated or in quotes
{
for(i=1;i<=NF;i++) { # iterate every field
gsub(/^\"|\"$/,"",$i) # remove leading and trailing quotes
print $i # output
}
}' file
Mike H<[email protected]>
[email protected]
Mike H<[email protected]>
Upvotes: 0
Reputation: 92854
gawk + bash solution (adding each item to array):
email_str='"Mike H<[email protected]>" [email protected] "Mike H<[email protected]>"'
readarray -t email_arr < <(awk -v FPAT="[^\"'[:space:]]+[^\"']+[^\"'[:space:]]+" \
'{ for(i=1;i<=NF;i++) print $i }' <<<$email_str)
Now, all items are in email_arr
Accessing the 2nd item:
echo "${email_arr[1]}"
[email protected]
Accessing the 3rd item:
echo "${email_arr[3]}"
Mike H<[email protected]>
Upvotes: 1
Reputation: 3089
Modify your regex like this :
grep -oP '("?\s*)\K.*?(?=")' file
Output:
Mike H<[email protected]>
[email protected]
Mike H<[email protected]>
Upvotes: 0
Reputation: 18371
Using gawk
where you can set multi-line RS
.
awk -v RS='"|" ' 'NF' inputfile
Mike H<[email protected]>
[email protected]
Mike H<[email protected]>
Upvotes: 0
Reputation: 103844
You can use Perl:
$ email='"Mike H<[email protected]>" [email protected] "Mike H<[email protected]>"'
$ echo "$email" | perl -lane 'while (/"([^"]+)"|(\S+)/g) {print $1 ? $1 : $2}'
Mike H<[email protected]>
[email protected]
Mike H<[email protected]>
Or in pure Bash, it gets kinda wordy:
re='\"([^\"]+)\"[[:space:]]*|([^[:space:]]+)[[:space:]]*'
while [[ $email =~ $re ]]; do
echo ${BASH_REMATCH[1]}${BASH_REMATCH[2]}
i=${#BASH_REMATCH}
email=${email:i}
done
# same output
Upvotes: 6
Reputation: 4043
You may use sed
to achieve that,
$ sed -r 's/"(.*)" (.*)"(.*)"/\1\n\2\n\3/g' <<< "$EMAIL"
Mike H<[email protected]>
[email protected]
Mike H<[email protected]>
Upvotes: 1