Reputation: 71
I have a big JSON file which I am parsing using jq. I am using regex to extract objects beginning with a certain pattern on an object attribute called "com". It works perfectly fine when I just do a basic select and return only the entries where it matched. My query looks like :
jq .'["posts"][] | select(.com|test("#(?!(p[0-9])|([0-9])|(q[0-9]|_))[a-zA-Z0-9]")) | .com' jsontest.json > oops.txt
The jsontest.json looks like :
{"posts": [{"archived_on": 3241233, "replies": 132,"com": "Life is good , and I don't want to take anything away from it . Literally #YOLO"}]}
{"posts": [{"archived_on": 456343423, "replies": 150,"com": "The premier league is returning and I am very excited for it "}]}
Output:
"Life is good , and I don't want to take anything away from it . Literally #YOLO".
I want to leverage the match(regex) or capture(regex) function and also get the individual output match objects for the matches, which in the above case would be #YOLO that caused the regex to be matched.
I have been stumbling upon this problem for a few hours now. I would be really grateful if anyone could guide me on how this could be achieved.
Upvotes: 0
Views: 392
Reputation: 116860
One way to show the match that's made by a call to test
is to use the idiom match(REGEX).string
, so that in your case you could modify your program slightly to read as follows:
.["posts"][]
| select(.com|test("#(?!(p[0-9])|([0-9])|(q[0-9]|_))[a-zA-Z0-9]"))
| .com
| match("#(?!(p[0-9])|([0-9])|(q[0-9]|_))[a-zA-Z0-9]")
| .string
This would however return "#Y", whereas your question indicates you want "#YOLO", so it would appear you will want something more like the following (notice the +
):
.["posts"][]
| select(.com|test("#(?!(p[0-9])|([0-9])|(q[0-9]|_))[a-zA-Z0-9]"))
| .com
| match("#(?!(p[0-9])|([0-9])|(q[0-9]|_))[a-zA-Z0-9]+")
| .string
It would be more efficient to eliminate the call to test
:
.posts[].com
| match("#(?!(p[0-9])|([0-9])|(q[0-9]|_))[a-zA-Z0-9]+")
| .string
capture
Simply wrap the REGEX in a named-capture structure of the form (?<x>REGEX).x
. For example:
.posts[].com
| capture("(?<x>#(?!(p[0-9])|([0-9])|(q[0-9]|_))[a-zA-Z0-9]+)")
| .x
Upvotes: 1