Brian
Brian

Reputation: 190

Filter keys on various levels from large complex nested json

Problem

I have a list of YouTube videos and and want to fetch their id, name and the preview image. I'm making use of youtube-dl to get a json output, that I parse for the keys id, title and the nested array thumbnails.

Input

For the purpose of a topical example video, let's take the perseverance landing:

youtube-dl -j "https://www.youtube.com/watch?v=4czjS9h4Fpg" | jq -r '[.id, .title, .thumbnails]'

This returns the following json:

[
  "4czjS9h4Fpg",
  "Perseverance Rover’s Descent and Touchdown on Mars (Official NASA Video)",
  [
    {
      "height": 94,
      "url": "https://i.ytimg.com/vi/4czjS9h4Fpg/hqdefault.jpg?sqp=-oaymwEbCKgBEF5IVfKriqkDDggBFQAAiEIYAXABwAEG&rs=AOn4CLBeXaobqWQ3MHAvEzLHQtitoAKKow",
      "width": 168,
      "resolution": "168x94",
      "id": "0"
    },
    {
      "height": 110,
      "url": "https://i.ytimg.com/vi/4czjS9h4Fpg/hqdefault.jpg?sqp=-oaymwEbCMQBEG5IVfKriqkDDggBFQAAiEIYAXABwAEG&rs=AOn4CLB2j8DNX2ZOyXHUS2MwRz4gG8admQ",
      "width": 196,
      "resolution": "196x110",
      "id": "1"
    },
    {
      "height": 138,
      "url": "https://i.ytimg.com/vi/4czjS9h4Fpg/hqdefault.jpg?sqp=-oaymwEcCPYBEIoBSFXyq4qpAw4IARUAAIhCGAFwAcABBg==&rs=AOn4CLDUIrTqT-g6F5z62q_Jq2RXy3AydQ",
      "width": 246,
      "resolution": "246x138",
      "id": "2"
    },
    {
      "height": 188,
      "url": "https://i.ytimg.com/vi/4czjS9h4Fpg/hqdefault.jpg?sqp=-oaymwEcCNACELwBSFXyq4qpAw4IARUAAIhCGAFwAcABBg==&rs=AOn4CLDtiAfOuC4lgjiMxXeJ3qIh7uV6Zg",
      "width": 336,
      "resolution": "336x188",
      "id": "3"
    },
    {
      "height": 1080,
      "url": "https://i.ytimg.com/vi/4czjS9h4Fpg/maxresdefault.jpg",
      "width": 1920,
      "resolution": "1920x1080",
      "id": "4"
    }
  ]
]

At this point I don't particularly care for selecting any specific video title image and would happily take all. I would like to process these further as CSV, and know that after selecting the according key/values I can pipe it to | @csv, but it's the selecting that I'm a bit lost on.

Expected Output

Ideally the output would look like this:

"4czjS9h4Fpg","Perseverance Rover’s Descent and Touchdown on Mars (Official NASA Video)","168x94","https://i.ytimg.com/vi/4czjS9h4Fpg/hqdefault.jpg?sqp=-oaymwEbCKgBEF5IVfKriqkDDggBFQAAiEIYAXABwAEG&rs=AOn4CLBeXaobqWQ3MHAvEzLHQtitoAKKow","196x110","https://i.ytimg.com/vi/4czjS9h4Fpg/hqdefault.jpg?sqp=-oaymwEbCMQBEG5IVfKriqkDDggBFQAAiEIYAXABwAEG&rs=AOn4CLB2j8DNX2ZOyXHUS2MwRz4gG8admQ","246x138","https://i.ytimg.com/vi/4czjS9h4Fpg/hqdefault.jpg?sqp=-oaymwEcCPYBEIoBSFXyq4qpAw4IARUAAIhCGAFwAcABBg==&rs=AOn4CLDUIrTqT-g6F5z62q_Jq2RXy3AydQ","336x188","https://i.ytimg.com/vi/4czjS9h4Fpg/hqdefault.jpg?sqp=-oaymwEcCNACELwBSFXyq4qpAw4IARUAAIhCGAFwAcABBg==&rs=AOn4CLDtiAfOuC4lgjiMxXeJ3qIh7uV6Zg","1920x1080","https://i.ytimg.com/vi/4czjS9h4Fpg/maxresdefault.jpg",

Pseudocode / Python?

In a more pythonic way, this is what I'm looking for in the output. I'm guessing I could pipe the json to python or so, but I'm thinking this should be a simple thing to do in jq too, no?

$id,$title,($thumbnails.resolution,$thumbnails.url for item in thumbnails)

Upvotes: 2

Views: 122

Answers (1)

oguz ismail
oguz ismail

Reputation: 50760

You can use map for expanding each object in the third element to resolution and url.

.[:2] + (.[2] | map(.resolution, .url)) | @csv

Online demo

Alternatively, an exception-driven approach like below would also yield the same result given your sample input.

map((.[] | .resolution, .url)? // .) | @csv

Online demo

Upvotes: 3

Related Questions