baba yaga
baba yaga

Reputation: 119

Unable to match pattern from JSON output

My bash script is parsing the JSON output of a build job and trying to find its status by extracting the string result, which has three options/values:

I'm able to extract the FAILURE and SUCCESS scenario by the command below:

val1=`curl -k -s $MY_URL ` 
output=`echo $val1 | sed -e 's/^.*"result":"\([^"]*\)".*$/\1/'`

but I'm unable to extract the value null with the above commands as it's not enclosed within double quotes.

On-going JSON output:

"keepLog":false,"number":10,"result":null,"timestamp":1456785876,

Completed build JSON output looks like below:

"keepLog":false,"number":10,"result":"FAILURE","timestamp":1456785876,

Any inputs on how to ignore the double quotes while matching the pattern and to extract only the string (null or FAILURE or SUCCESS)?

Upvotes: 2

Views: 200

Answers (3)

Benjamin W.
Benjamin W.

Reputation: 52152

If your grep supports Perl-compatible regular expressions (PCRE), you can use this command:

grep -Po '"result":"?\K[^",]*(?="?,)' infile

where the contents of infile are

"keepLog":false,"number":10,"result":null,"timestamp":1456785876,
"keepLog":false,"number":10,"result":"FAILURE","timestamp":1456785876,
  • -o retains only the matched part
  • "result":"?\K matches the part before the \K, but doesn't include it in the match ("variable-length positive look-behind")
    • "? is an optional ", so both "result":" and "result": match
  • [^",]* matches any number of characters that are not either " or ,
  • (?="?,) is a positive look-ahead, i.e., the match must be followed by the pattern "?,: an optional " followed by a comma

If your grep does not support PCRE, you can use two commands like this (same input file):

grep -Eo '"result":"?[^",]*' infile | grep -o '[^":]*$'
  • -E is for extended regular expressions so we can use the ? modifier
  • -o is the same as above
  • "result":"?[^",]* matches both "result": and "result":" followed by any number of characters other than " or , – the output of the first command looks like this:

    "result":null
    "result":"FAILURE
    
  • In the second command, [^":]*$ matches any number of characters other than " or : at the end of the string, resulting in

    null
    FAILURE
    

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203655

Just use awk:

$ cat file
"keepLog":false,"number":10,"result":null,"timestamp":1456785876,
"keepLog":false,"number":10,"result":"FAILURE","timestamp":1456785876,

$ awk -F'[:",]+' '{print $7}' file
null
FAILURE

or if there's more in your log file than you've shown us and you need to only find the "result" lines:

$ awk -F'[:",]+' '$6=="result"{print $7}' file
null
FAILURE

If that's not what you need then edit your question to provide more truly representative sample input/output.

Upvotes: 0

halfbit
halfbit

Reputation: 3464

You can use multiple groups in your regular expression like so:

echo $'..."result":null...\n..."result":"FAILURE"...' | \
sed -e 's/^.*"result":\("\([^"]*\)"\|\(null\)\).*$/\2\3/'

The above example outputs

null
FAILURE

The expression either matches the first (quoted) or the second (null) alternative (\|), never both. The corresponding groups are the \2 and \3.

Upvotes: 2

Related Questions