seafre
seafre

Reputation: 97

Use JQ to parse JSON nested objects, using select to match key-value in nested object while showing existing structure

Use JQ to parse JSON nested objects, using select to match key-value in nested object while showing existing structure

I am trying to take a complex JSON file of 20,000+ lines and extract a specific key while retaining the surrounding metadata which adds necessary human-understandable context.


Data source (complex structure):

{
  "Marketplace": [
    {
      "Level1Name": "Company A Products",
      "Level1Array": [
        {
          "Level2Name": "USA Products List",
          "Level2Contents": [
            {
              "Level3Name": "ALL",
              "Level3URL": "https://a.com/products"
            },
            {
              "Level3Name": "Subset1001",
              "Level3URL": "https://a.com/products/subset1001"
            }
          ]
        }
      ]
    },
    {
      "Level1Name": "Company B Products",
      "Level1Array": [
        {
          "Level2Name": "USA Products List",
          "Level2Contents": [
            {
              "Level3Name": "ALL",
              "Level3URL": "https://b.com/products"
            },
            {
              "Level3Name": "Subset500",
              "Level3URL": "https://b.com/products/subset500"
            }
          ]
        },
        {
          "Level2Name": "EU Products List",
          "Level2Contents": [
            {
              "Level3Name": "ALL",
              "Level3URL": "https://b.eu/products"
            },
            {
              "Level3Name": "Subset200",
              "Level3URL": "https://b.eu/products/subset200"
            }
          ]
        }
      ]
    },
    {
      "Level1Name": "Company X Products",
      "Level1Array": [
        {
          "Level2Name": "Deleted Products",
          "Level2URL": "https://internal.x.com/products"
        }
      ]
    }
  ]
}

JQ command currently used to extract removes all other contextual metadata...

jq -r '(
         .Marketplace[].Level1Array[].Level2Contents[]
         | select (.Level3Name | index("ALL"))
         | [.]
         )'

Output given...

[
  {
    "Level3Name": "ALL",
    "Level3URL": "https://a.com/products"
  }
]
[
  {
    "Level3Name": "ALL",
    "Level3URL": "https://b.com/products"
  }
]
[
  {
    "Level3Name": "ALL",
    "Level3URL": "https://b.eu/products"
  }
]

Option 1 output desired, same JSON structure with removing all other objects which does not match select filter "ALL" string criteria

{
    "Marketplace":
  [
        {
            "Level1Name": "Company A Products",
            "Level1Array": [
                {
                    "Level2Name": "USA Products List",
                    "Level2Contents": [
                        {
                            "Level3Name": "ALL",
                            "Level3URL": "https://a.com/products"
                        }
                    ]
                }
            ]
        },
        {
            "Level1Name": "Company B Products",
            "Level1Array": [
                {
                    "Level2Name": "USA Products List",
                    "Level2Contents": [
                        {
                            "Level3Name": "ALL",
                            "Level3URL": "https://b.com/products"
                        }
                    ]
                },
                {
                    "Level2Name": "EU Products List",
                    "Level2Contents": [
                        {
                            "Level3Name": "ALL",
                            "Level3URL": "https://b.eu/products"
                        }
                    ]
                }
            ]
        }
    ]
}

Option 2 output desired, any similar format which can be iterated with loop such as:

{
  "Marketplace":
  [
    {
      "Level1Name": "Company A Products",
      "Level2Name": "USA Products List",
      "Level3Name": "ALL",
      "Level3URL": "https://a.com/products"
    },
    {
      "Level1Name": "Company B Products",
      "Level2Name": "USA Products List",
      "Level3Name": "ALL",
      "Level3URL": "https://b.com/products"
    },
    {
      "Level1Name": "Company B Products",
      "Level2Name": "EU Products List",
      "Level3Name": "ALL",
      "Level3URL": "https://b.eu/products"
    }
  ]
}

Upvotes: 1

Views: 2281

Answers (2)

peak
peak

Reputation: 116670

The following filter produces the "Option 2" output:

.Marketplace |= map(
  {Level1Name} as $Level1Name
  | .Level1Array[]
  | {Level2Name} as $Level2Name
  | .Level2Contents[]?
  | select(.Level3Name == "ALL")
  | $Level1Name + $Level2Name + . )

Breaking it down ...

One way to understand this is to consider:

.Marketplace[]
| {Level1Name} as $Level1Name
| .Level1Array[]
| {Level2Name} as $Level2Name
| .Level2Contents[]?             # in case .Level2Contents is missing
| if (.Level3Name == "ALL")
  then $Level1Name + $Level2Name + .
  else empty
  end

Addendum: "Name"

The OP subsequently asked what can be done if the three levels of "Name" keys are all named "Name". An answer can easily be obtained by tweaking the above, to yield:

.Marketplace |= map(
  {Level1Name: .Name} as $Level1Name
  | .Level1Array[]
  | {Level2Name: .Name} as $Level2Name
  | .Level2Contents[]?
  | select(.Name == "ALL")
  | $Level1Name + $Level2Name + . )

Output

In this case, the output would be as follows:

{
  "Marketplace": [
    {
      "Level1Name": "Company A Products",
      "Level2Name": "USA Products List",
      "Name": "ALL",
      "Level3URL": "https://a.com/products"
    },
    {
      "Level1Name": "Company B Products",
      "Level2Name": "USA Products List",
      "Name": "ALL",
      "Level3URL": "https://b.com/products"
    },
    {
      "Level1Name": "Company B Products",
      "Level2Name": "EU Products List",
      "Name": "ALL",
      "Level3URL": "https://b.eu/products"
    }
  ]
}

Upvotes: 2

Jeff Mercado
Jeff Mercado

Reputation: 134811

Here's a different approach you could take for this problem. As I understand it, you want a way to search your recursive tree of objects for some value, and remove any objects that do not have a property with that value.

What you could do is search for the paths of all values you wish to keep (has the value you're searching for), then remove all other objects that are not on the paths of any of the paths to keep.

def is_subpath($paths): [., length] as [$path, $length] |
    any($paths[]; $length <= length and $path == .[:$length]);
[paths(strings == "ALL")[:-1]] as $keepers
| delpaths([paths(objects) | select(is_subpath($keepers) | not)])
{
  "Marketplace": [
    {
      "Level1Name": "Company A Products",
      "Level1Array": [
        {
          "Level2Name": "USA Products List",
          "Level2Contents": [
            {
              "Level3Name": "ALL",
              "Level3URL": "https://a.com/products"
            }
          ]
        }
      ]
    },
    {
      "Level1Name": "Company B Products",
      "Level1Array": [
        {
          "Level2Name": "USA Products List",
          "Level2Contents": [
            {
              "Level3Name": "ALL",
              "Level3URL": "https://b.com/products"
            }
          ]
        },
        {
          "Level2Name": "EU Products List",
          "Level2Contents": [
            {
              "Level3Name": "ALL",
              "Level3URL": "https://b.eu/products"
            }
          ]
        }
      ]
    }
  ]
}

Upvotes: 0

Related Questions