Goldfish
Goldfish

Reputation: 720

JQ filtering on fields in nested Objects

I have a large set of data, I am using JQ to construct the object that contains only the data I am interested in for a record. My problem is that I am starting to see duplicate objects, it seems my syntax is incorrect.

I am working with an object that contains flat fields and an array of subObjects, there are particular fields I want to pull out and make new objects that have all the data I want. Including some flat fields and some fields from the array objects.

Here is a smaller sample that helps demonstrate problem tmpData.json

{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batter": [{
        "id": "1001",
        "type": "Regular"
    },
    {
        "id": "1002",
        "type": "Chocolate"
    },
    {
        "id": "1003",
        "type": "Blueberry"
    },
    {
        "id": "1004",
        "type": "Devil's Food"
    }
]
}

I run this : cat tmpData.txt | jq {'id: .id, type: .type, batter: .batter[].id'}

Which outputs this non-json set of objects (it's missing commas)

{
  "id": "0001",
  "type": "donut",
  "batter": "1001"
}
{
  "id": "0001",
  "type": "donut",
  "batter": "1002"
}
{
  "id": "0001",
  "type": "donut",
  "batter": "1003"
}
{
  "id": "0001",
  "type": "donut",
  "batter": "1004"
}

This is good. I now have objects each containing the parentID 0001 and the different items in the array are associated in each object.

When I run: cat tmpData.txt | jq {'id: .id, type: .type, batterID: .batter[].id, batterType: .batter[].type'}

With the added type field I get a lot of duplicates that wrongly associate items

{
  "id": "0001",
  "type": "donut",
  "batterID": "1001",
  "batterType": "Regular"
}
{
  "id": "0001",
  "type": "donut",
  "batterID": "1001",
  "batterType": "Chocolate"
}
{
  "id": "0001",
  "type": "donut",
  "batterID": "1001",
  "batterType": "Blueberry"
}
{
  "id": "0001",
  "type": "donut",
  "batterID": "1001",
  "batterType": "Devil's Food"
}
{
  "id": "0001",
  "type": "donut",
  "batterID": "1002",
  "batterType": "Regular"
}
{
  "id": "0001",
  "type": "donut",
  "batterID": "1002",
  "batterType": "Chocolate"
}
{
  "id": "0001",
  "type": "donut",
  "batterID": "1002",
  "batterType": "Blueberry"
}
{
  "id": "0001",
  "type": "donut",
  "batterID": "1002",
  "batterType": "Devil's Food"
}
{
  "id": "0001",
  "type": "donut",
  "batterID": "1003",
  "batterType": "Regular"
}
{
  "id": "0001",
  "type": "donut",
  "batterID": "1003",
  "batterType": "Chocolate"
}
{
  "id": "0001",
  "type": "donut",
  "batterID": "1003",
  "batterType": "Blueberry"
}
{
  "id": "0001",
  "type": "donut",
  "batterID": "1003",
  "batterType": "Devil's Food"
}
{
  "id": "0001",
  "type": "donut",
  "batterID": "1004",
  "batterType": "Regular"
}
{
  "id": "0001",
  "type": "donut",
  "batterID": "1004",
  "batterType": "Chocolate"
}
{
  "id": "0001",
  "type": "donut",
  "batterID": "1004",
  "batterType": "Blueberry"
}
{
  "id": "0001",
  "type": "donut",
  "batterID": "1004",
  "batterType": "Devil's Food"
}

Now I see that each batterID is in an object with every type regular, chocolate, blueberry. But in fact 1002 is only ever chocolate.

My ideal output would be like this

 [{
"id": "0001",
"type": "donut",
"batterID": "1001",
"batterType": "Regular"
},
{
"id": "0001",
"type": "donut",
"batterID": "1002",
"batterType": "Chocolate"
}] 

Your expertise is appreciated!

EDIT SOLVED: working command: cat tmpData.txt | jq '[{id, type} + (.batter[] | {batterId: .id, batterType: .type})]'

Upvotes: 2

Views: 4720

Answers (1)

peak
peak

Reputation: 116670

  1. The output "without commas" is a stream of JSON; to emit an array, wrap your jq filter in square brackets.
  2. You can abbreviate {id: id, type: .type} to {id, type}
  3. Your filter which repeats .batter[] has the effect of creating a Cartesian product. What you evidently want instead is to expand .batter just once.

Putting everything together:

[{id, type} + (.batter[] | {batterId: .id, batterType: .type})]

Upvotes: 6

Related Questions