Reputation: 1889
I am putting together some bash script for parsing a URL into its components. I am blocked trying to figure out how to add an array value to a key within a JSON body.
I have parsed the following URL:
https://bar.foo.com/v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders
This URL's path is:
URL_PATH: v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders
This URL's path parts array is using
IFS='/' read -ra URL_PATH_PARTS <<< "$URL_PATH"
URL_PATH_PARTS [4]: v2020 folders 8d55e749-bbd7-e811-9c19-3ca82a1e3f41 folders
I want to add an array value to JSON that is formatted as follows:
{
...
"parts": ["v2020", "folders", "8d55e749-bbd7-e811-9c19-3ca82a1e3f41", "folders"]
}
However, currently it looks like this and not sure how to best take the next step:
{
...
"parts": "[v2020 folders 8d55e749-bbd7-e811-9c19-3ca82a1e3f41 folders]"
}
#!/usr/bin/env bash
HREF='https://bar.foo.com/v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders'
# remove quotes
HREF=$(echo $HREF | tr -d '"')
echo " HREF: $HREF"
# extract the PROTOCOL
URL_PROTOCOL=$(echo $HREF | grep :// | sed -e's,^\(.*://\).*,\1,g')
echo " URL_PROTOCOL: $URL_PROTOCOL"
# extract the PROTOCOL SCHEME
URL_SCHEME=`echo ${URL_PROTOCOL::-3}`
echo " URL_SCHEME: $URL_SCHEME"
# remove the PROTOCOL -- updated
URL=$(echo $HREF | sed -e s,$URL_PROTOCOL,,g)
echo " URL: $URL"
# extract the host and port -- updated
URL_HOSTPORT=$(echo $URL | sed -e s,$user@,,g | cut -d/ -f1)
echo " URL_HOSTPORT: $URL_HOSTPORT"
# by request host without port
URL_HOST="$(echo $URL_HOSTPORT | sed -e 's,:.*,,g')"
echo " URL_HOST: $URL_HOST"
# by request - try to extract the port
URL_PORT="$(echo $URL_HOSTPORT | sed -e 's,^.*:,:,g' -e 's,.*:\([0-9]*\).*,\1,g' -e 's,[^0-9],,g')"
echo " URL_PORT: $URL_PORT"
# Extract the path
URL_PATH="$(echo $URL | grep / | cut -d/ -f2-)"
echo " URL_PATH: $URL_PATH"
IFS='/' read -ra URL_PATH_PARTS <<< "$URL_PATH"
echo " URL_PATH_PARTS [${#URL_PATH_PARTS[@]}]: ${URL_PATH_PARTS[@]}"
URL_COMPONENTS="{ \
\"protocol\": \"$URL_PROTOCOL\", \
\"scheme\": \"$URL_SCHEME\", \
\"url\": \"$URL\", \
\"host\": \"$URL_HOST\", \
\"path\": \"$URL_PATH\", \
\"parts\": \"[${URL_PATH_PARTS[@]}]\" \
}"
echo -e "\n URL_COMPONENTS:"
echo $URL_COMPONENTS |
jq '.'
HREF: https://bar.foo.com/v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders
URL_PROTOCOL: https://
URL_SCHEME: https
URL: bar.foo.com/v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders
URL_HOST: bar.foo.com
URL_PATH: v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders
URL_PATH_PARTS [4]: v2020 folders 8d55e749-bbd7-e811-9c19-3ca82a1e3f41 folders
URL_COMPONENTS:
{
"protocol": "https://",
"scheme": "https",
"url": "bar.foo.com/v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders",
"host": "bar.foo.com",
"path": "v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders",
"parts": "[v2020 folders 8d55e749-bbd7-e811-9c19-3ca82a1e3f41 folders]"
}
Appreciative of all feedback and suggestions!
Upvotes: 1
Views: 102
Reputation: 2664
Don't bother with the array. Use variable substitution:
URL_PATH_PARTS=${URL_PATH//\/ } # Replace slashes with spaces
SPACES="${URL_PATH_PARTS//[^ ]} " # Append space to avoid fence-post error.
echo " URL_PATH_PARTS [${#SPACES}]: ${URL_PATH_PARTS}"
...
\"parts\": [ \"${URL_PATH_PARTS// /\", \"}\" ] \ # Replace spaces with '", "'
You could also do away with the intermediate 'URL_PATH_PARTS' variable (and lose some readability):
SLASHES="${URL_PATH//[^\/]}/" # Append slash to avoid fence-post error.
echo " URL_PATH_PARTS [${#SLASHES}]: ${URL_PATH//\// }"
...
\"parts\": [ \"${URL_PATH//\//\", \"}\" ] \ # Replace slashes with '", "'
Upvotes: 2
Reputation: 1889
Thanks @CharlesDuffy, @dash-o, @AndrewVickers
I tried out all your suggestions.
The suggested approach I took was joelpurra/jq-hopkok
#!/usr/bin/env bash
URL='"https://apiuatna11.springcm.com/v201411/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders"'
# URL to components
echo $URL | ./jq-hopkok/src/url/to-components.sh
{
"value": "https://apiuatna11.springcm.com/v201411/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders",
"valid": true,
"scheme": {
"value": "https",
"valid": true
},
"domain": {
"value": "apiuatna11.springcm.com",
"components": [
"apiuatna11.springcm.com",
"springcm.com",
"com"
],
"tld": "com",
"valid": true
},
"port": {
"value": null,
"separator": false,
"valid": true
},
"path": {
"value": "/v201411/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders",
"components": [
"v201411",
"folders",
"8d55e749-bbd7-e811-9c19-3ca82a1e3f41",
"folders"
],
"valid": true
},
"query": {
"value": null,
"separator": false,
"components": [],
"valid": true
},
"fragment": {
"value": null,
"separator": false,
"valid": true
}
}
Upvotes: 1
Reputation: 14491
Current code using: \"parts\": \"[${URL_PATH_PARTS[@]}]\"
for the path. Possible solution is to iterate over the elements, creating combined string with quotes, and ',' separator
PP=
for P1 in "${URL_PATH_PARTS[@]}" ; do
# Add ',' unless this is first item
[ "$PP" ] && PP="$PP, "
PP=$PP\"$P1\"
done
The replace IN (URL components)
\"parts\": \"[${URL_PATH_PARTS[@]}]\"
With
\"parts\": [ $PP ]
Upvotes: 1