Reputation: 25
I have an external program which hands me a bunch of information via stdin ($1) to my script.
I get a line like the following:
session number="2018/06/20-234",data name="XTRDF_SLSLWX3_FSLO",data group="Testing",status="Error",data type="0"
Now I want to use this line split into single variables.
I thought about two ways until now:
INPUT='session number="2018/06/20-234",data name="XTRDF_SLSLWX3_FSLO",data group="Testing",status="Error",data type="0"'
echo "$INPUT" | tr ',' '\n' | tr ' ' '_' > vars.tmp
set vars.tmp
This will do the job until I have a data_name variable with a space in it, my trim command will automatically change it to _ and my assigned variable is no longer correct in upcoming checks.
So I thought about loading the input into a array and do some pattern substitution on the array to delete everything until and including the = and do some variable assignments afterwards
INPUT='session number="2018/06/20-234",data name="XTRDF_SLSLWX3_FSLO",data group="Testing",status="Error",data type="0"'
IFS=',' read -r -a array <<< "$INPUT"
array=("${array[@]/#*=/}")
session_number="${array[0]}"
data_name="${array[1]}"
....
But now I have a strange behaviour cutting the input if there is a = somewhere in the data name or data group and I have no idea if this is the way to do it. I'm pretty sure there should be no = in the data name or data group field compared to a space but you never know...
How could I do this?
Upvotes: 0
Views: 36
Reputation: 295638
If you don't need to worry about commas or literal quotes inside the quoted data, the following handles the case you asked about (stray =
s within the data) sanely:
#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: Requires bash 4.0 or newer" >&2; exit 1;; esac
input='session number="2018/06/20-234",data name="XTRDF_SLSLWX3_FSLO",data group="Testing",status="Error",data type="0"'
declare -A data=( )
IFS=, read -r -a pieces <<<"$input"
for piece in "${pieces[@]}"; do
key=${piece%%=*} # delete everything past the *first* "=", ignoring later ones
value=${piece#*=} # delete everything before the *first* "=", ignoring later ones
value=${value#'"'} # remove leading quote
value=${value%'"'} # remove trailing quote
data[$key]=$value
done
declare -p data
...results in (whitespace added for readability, otherwise literal output):
declare -A data=(
["data type"]="0"
[status]="Error"
["data group"]="Testing"
["data name"]="XTRDF_SLSLWX3_FSLO"
["session number"]="2018/06/20-234"
)
Now, let's say you do need to worry about commas inside your quotes! Consider the following input:
input='session number="123",error="Unknown, please try again"'
Now, if we try to split on commas without considering their position, we'll have error="Unknown
and have please try again
as a stray value.
To solve this, we can use GNU awk with the FPAT feature.
#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: Requires bash 4.0 or newer" >&2; exit 1;; esac
input='session number="123",error="Unknown, please try again"'
# Why do so many awk people try to write one-liners? Isn't this more readable?
awk_script='
BEGIN {
FPAT = "[^=,]+=(([^,]+)|(\"[^\"]+\"))"
}
{
printf("%s\0", NF)
for (i = 1; i <= NF; i++) {
printf("%s\0", $i)
}
}
'
while :; do
IFS= read -r -d '' num_fields || break
declare -A data=( )
for ((i=0; i<num_fields; i++)); do
IFS= read -r -d '' piece || break
key=${piece%%=*}
value=${piece#*=}
value=${value#'"'}
value=${value%'"'}
data[$key]=$value
done
declare -p data # maybe invoke a callback here, before going on to the next line
done < <(gawk "$awk_script" <<<"$input")
...whereafter output is properly:
declare -A data=(["session number"]="123" [error]="Unknown, please try again" )
Upvotes: 1