Pujan Paudel
Pujan Paudel

Reputation: 71

Get the results of match using regex pattern on complex objects

I have a big JSON file which I am parsing using jq. I am using regex to extract objects beginning with a certain pattern on an object attribute called "com". It works perfectly fine when I just do a basic select and return only the entries where it matched. My query looks like :

jq .'["posts"][] | select(.com|test("#(?!(p[0-9])|([0-9])|(q[0-9]|_))[a-zA-Z0-9]")) | .com' jsontest.json > oops.txt

The jsontest.json looks like :

{"posts": [{"archived_on": 3241233, "replies": 132,"com": "Life is good , and I don't want to take anything away from it . Literally #YOLO"}]}
{"posts": [{"archived_on": 456343423, "replies": 150,"com": "The premier league is returning and I am very excited for it "}]}

Output:

"Life is good , and I don't want to take anything away from it . Literally #YOLO".

I want to leverage the match(regex) or capture(regex) function and also get the individual output match objects for the matches, which in the above case would be #YOLO that caused the regex to be matched.

I have been stumbling upon this problem for a few hours now. I would be really grateful if anyone could guide me on how this could be achieved.

Upvotes: 0

Views: 392

Answers (1)

peak
peak

Reputation: 116860

One way to show the match that's made by a call to test is to use the idiom match(REGEX).string, so that in your case you could modify your program slightly to read as follows:

.["posts"][]
| select(.com|test("#(?!(p[0-9])|([0-9])|(q[0-9]|_))[a-zA-Z0-9]"))
| .com
| match("#(?!(p[0-9])|([0-9])|(q[0-9]|_))[a-zA-Z0-9]")
| .string

This would however return "#Y", whereas your question indicates you want "#YOLO", so it would appear you will want something more like the following (notice the +):

.["posts"][]
| select(.com|test("#(?!(p[0-9])|([0-9])|(q[0-9]|_))[a-zA-Z0-9]"))
| .com
| match("#(?!(p[0-9])|([0-9])|(q[0-9]|_))[a-zA-Z0-9]+")
| .string

A more efficient solution

It would be more efficient to eliminate the call to test:

.posts[].com
| match("#(?!(p[0-9])|([0-9])|(q[0-9]|_))[a-zA-Z0-9]+")
| .string

Using capture

Simply wrap the REGEX in a named-capture structure of the form (?<x>REGEX).x. For example:

.posts[].com
| capture("(?<x>#(?!(p[0-9])|([0-9])|(q[0-9]|_))[a-zA-Z0-9]+)")
| .x

Upvotes: 1

Related Questions