ndrone
ndrone

Reputation: 3582

Regex Pattern matching refinement

I have a json that is returned to a variable, I'm trying to only grab values of from the json. I'm only limited to grep, sed, and awk

RESULTS='{ "results" : [ { "repo" : "appdeploy", "path" : "org/test/cxp/python/1.0-SNAPSHOT", "name" : "python-1.0-20170519.130808-42.jar" } ], "range" : { "start_pos" : 0, "end_pos" : 1, "total" : 1 } }'
echo $RESULTS | grep -o '"path" : "(.*)",'

returns me the result

"path" : "org/test/cxp/python/1.0-SNAPSHOT",

and honestly the only part I want is

org/test/cxp/python/1.0-SNAPSHOT

Upvotes: 1

Views: 65

Answers (3)

dawg
dawg

Reputation: 103864

It is best to use a JSON parser for these type things.

Python, Ruby, Perl all have robust JSON parsers available.

Here is an example in Python:

$ python -c '
import json
import fileinput

print json.loads("".join(line for line in fileinput.input()))["results"][0]["path"]
' <<<$(echo "$RESULTS")
org/test/cxp/python/1.0-SNAPSHOT

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626893

With jq, you could use '.results[0] | .path' filter. You may play around with this tool online here.

However, if you have no access to jq, you may use a PCRE based grep command like

grep -oP '(?<="path" : ")[^"]+'

The -P option enables the PCRE regex syntax usage where you may use lookarounds that only check for the pattern match, but do not include the matched text into the returned match value.

Pattern details

  • (?<="path" : ") - a positive lookbehind that matches a position that is preceded with "path" : " substring
  • [^"]+ - a negated bracket expression that matches and consumes (adds to the match value) 1 or more chars other than ".

See the online grep demo:

RESULTS='{ "results" : [ { "repo" : "appdeploy", "path" : "org/test/cxp/python/1.0-SNAPSHOT", "name" : "python-1.0-20170519.130808-42.jar" } ], "range" : { "start_pos" : 0, "end_pos" : 1, "total" : 1 } }'
echo $RESULTS | grep -oP '(?<="path" : ")[^"]+'

Printing org/test/cxp/python/1.0-SNAPSHOT.

Upvotes: 1

Leah Zorychta
Leah Zorychta

Reputation: 13419

here you go, using both grep and sed:

echo $RESULTS | grep -op '"path" :\s"[^"]*"' | sed 's/"//g' | sed 's/path : //g'

This works first by doing echo $RESULTS | grep -op '"path" :\s"[^"]*"' which produces "path" : "org/test/cxp/python/1.0-SNAPSHOT" then the first call to sed 's/"//g' strips out the double quotes and the second call sed 's/path : //g' strips out path :

Upvotes: 0

Related Questions