manderson
manderson

Reputation: 15

jq remove text before and after json

Ingesting another sourcetype that provides insane json output. It starts out like:

Sep  1 15:52:26 | IdentityValidationApi |  |  |  | {"header":{"tenantId":"X03LHWE3","requestType":"  ...

and has a pipe in between the request and the response, but both are on the same line:

..."serverTime":"2017-09-01T19:52:24.641Z"}}} | {"responseHeader":{"tenantID":  

and the json output ends with

...,"fieldValue":"Engineer"}]}}} | D2C CrossCore Request-Response | IdentityValidationApi.corp-dev.com | /api/Inquiry | 172.30.68.88 |  | True

I've tried jq, using jq .header[], but it hates that | in the middle of the event. End goal is to ingest the entire event into Splunk without the beginning or end text outside the json. Can someone suggest any steps here? Thank you.

Edit: I can use sed to pull out the beginning of the line, but am unsure how to combine that with removing the text from the end. Can I do that?

Upvotes: 0

Views: 3001

Answers (2)

jq170727
jq170727

Reputation: 14695

While Jeff's answer pretty much sums it up, here's a specific example assembled from the sample data fragments. If the file data contains

Sep  1 15:52:26 | IdentityValidationApi |  |  |  | {"header":{"tenantId":"X03LHWE3"}, "serverTime":"2017-09-01T19:52:24.641Z"} | {"responseHeader":{"tenantID": "...", "fieldValue":"Engineer"}} | D2C CrossCore Request-Response | IdentityValidationApi.corp-dev.com | /api/Inquiry | 172.30.68.88 |  | True

then

$ jq -M -Rc './"|" | .[5] | fromjson' data

will produce just the json fragment from column 5:

{"header":{"tenantId":"X03LHWE3"},"serverTime":"2017-09-01T19:52:24.641Z"}

This filter

$ jq -M -Rc './"|" | (.[5]|fromjson) + (.[6]|fromjson)' data

will combine the objects in columns 5 and 6 into one object:

{"header":{"tenantId":"X03LHWE3"},"serverTime":"2017-09-01T19:52:24.641Z","responseHeader":{"tenantID":"...","fieldValue":"Engineer"}}

Upvotes: 0

Jeff Mercado
Jeff Mercado

Reputation: 134521

jq is designed to work with json data. Your input is not pure json. If you can make certain assumptions about your input, then you can probably process the json parts. Any deviation in any of the inputs will break things.

  1. the pipe (|) is only used as a delimiter throughout the file, kind of like a "pipe separated values" file (a la csv but with no escape sequences)
    jq can consume raw files as strings, if pipes are really only used as delimiters, we don't have to worry about parsing it
  2. data in the file does not span multiple rows and only occupies a single row
    without parsing the data or assuming any patterns in the file, it will be impossible to know which lines belong to a single item and when a new one starts
  3. your json data will always be found in a fixed column of the psv row
    again, it will be impossible to know where the request or response parts are in the row if it isn't in fixed places without further processing

If these assumptions hold true, you could probably use something like this:

$ jq -R 'split("|") | {request:.[5]|fromjson,response:.[6]|fromjson}' input.psv

This should give you objects with which you could access the request and response objects. Then you can operate on these.

Upvotes: 1

Related Questions