Anuj TBE
Anuj TBE

Reputation: 9790

Extract all URLs from the json object in Python

I have a long json object which contains URL links in the value, these links can be at any depth and with any key. The depth and key is not known. Ex.,

data = {
  "name": "John Doe",
  "a": "https:/example.com",
  "b": {
    "c": "https://example.com/path",
    "d": {
      "e": "https://example.com/abc/?q=u",
    }
  }
}

I want to extract all links in a list like

links = ["https://example.com", "https://example.com/path", "https://example.com/abc/?q=u"]

How can I extract all the links from the object using Python?

Upvotes: 0

Views: 320

Answers (1)

Roy2012
Roy2012

Reputation: 12503

Here's a recursive solution:

def extract_urls(d):
    urls = []
    for k, v in d.items():
        if isinstance(v, str) and v.lower().startswith("http"):
            urls.append(v)
        elif isinstance(v, dict):
            urls.extend(etract_urls(v))
    return urls

extract_urls(data)

Output:

['https:/example.com',
 'https://example.com/path',
 'https://example.com/abc/?q=u']

Upvotes: 2

Related Questions