T-One
T-One

Reputation: 25

Assign and/or manipulate incoming variables (string) from external program in bash

I have an external program which hands me a bunch of information via stdin ($1) to my script.

I get a line like the following:

session number="2018/06/20-234",data name="XTRDF_SLSLWX3_FSLO",data group="Testing",status="Error",data type="0"

Now I want to use this line split into single variables.

I thought about two ways until now:

INPUT='session number="2018/06/20-234",data name="XTRDF_SLSLWX3_FSLO",data group="Testing",status="Error",data type="0"'
echo "$INPUT" | tr ',' '\n' | tr ' ' '_' > vars.tmp
set vars.tmp

This will do the job until I have a data_name variable with a space in it, my trim command will automatically change it to _ and my assigned variable is no longer correct in upcoming checks.

So I thought about loading the input into a array and do some pattern substitution on the array to delete everything until and including the = and do some variable assignments afterwards

INPUT='session number="2018/06/20-234",data name="XTRDF_SLSLWX3_FSLO",data group="Testing",status="Error",data type="0"'
IFS=',' read -r -a array <<< "$INPUT"
array=("${array[@]/#*=/}")
session_number="${array[0]}"
data_name="${array[1]}"
....

But now I have a strange behaviour cutting the input if there is a = somewhere in the data name or data group and I have no idea if this is the way to do it. I'm pretty sure there should be no = in the data name or data group field compared to a space but you never know...

How could I do this?

Upvotes: 0

Views: 36

Answers (1)

Charles Duffy
Charles Duffy

Reputation: 295638

Simple Case: No Commas Within Strings

If you don't need to worry about commas or literal quotes inside the quoted data, the following handles the case you asked about (stray =s within the data) sanely:

#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: Requires bash 4.0 or newer" >&2; exit 1;; esac

input='session number="2018/06/20-234",data name="XTRDF_SLSLWX3_FSLO",data group="Testing",status="Error",data type="0"'
declare -A data=( )
IFS=, read -r -a pieces <<<"$input"
for piece in "${pieces[@]}"; do
  key=${piece%%=*}   # delete everything past the *first* "=", ignoring later ones
  value=${piece#*=}  # delete everything before the *first* "=", ignoring later ones
  value=${value#'"'} # remove leading quote
  value=${value%'"'} # remove trailing quote
  data[$key]=$value
done
declare -p data

...results in (whitespace added for readability, otherwise literal output):

declare -A data=(
  ["data type"]="0"
  [status]="Error"
  ["data group"]="Testing"
  ["data name"]="XTRDF_SLSLWX3_FSLO"
  ["session number"]="2018/06/20-234"
)

Handling Commas Inside Quotes

Now, let's say you do need to worry about commas inside your quotes! Consider the following input:

input='session number="123",error="Unknown, please try again"'

Now, if we try to split on commas without considering their position, we'll have error="Unknown and have please try again as a stray value.

To solve this, we can use GNU awk with the FPAT feature.

#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: Requires bash 4.0 or newer" >&2; exit 1;; esac

input='session number="123",error="Unknown, please try again"'

# Why do so many awk people try to write one-liners? Isn't this more readable?
awk_script='
BEGIN {
    FPAT = "[^=,]+=(([^,]+)|(\"[^\"]+\"))"
}

{
    printf("%s\0", NF)
    for (i = 1; i <= NF; i++) {
        printf("%s\0", $i)
    }
}
'

while :; do
  IFS= read -r -d '' num_fields || break
  declare -A data=( )
  for ((i=0; i<num_fields; i++)); do
    IFS= read -r -d '' piece || break
    key=${piece%%=*}
    value=${piece#*=}
    value=${value#'"'}
    value=${value%'"'}
    data[$key]=$value
  done
  declare -p data # maybe invoke a callback here, before going on to the next line
done < <(gawk "$awk_script" <<<"$input")

...whereafter output is properly:

declare -A data=(["session number"]="123" [error]="Unknown, please try again" )

Upvotes: 1

Related Questions