Reputation: 9790
I have a long json object which contains URL links in the value, these links can be at any depth and with any key. The depth and key is not known. Ex.,
data = {
"name": "John Doe",
"a": "https:/example.com",
"b": {
"c": "https://example.com/path",
"d": {
"e": "https://example.com/abc/?q=u",
}
}
}
I want to extract all links in a list like
links = ["https://example.com", "https://example.com/path", "https://example.com/abc/?q=u"]
How can I extract all the links from the object using Python?
Upvotes: 0
Views: 320
Reputation: 12503
Here's a recursive solution:
def extract_urls(d):
urls = []
for k, v in d.items():
if isinstance(v, str) and v.lower().startswith("http"):
urls.append(v)
elif isinstance(v, dict):
urls.extend(etract_urls(v))
return urls
extract_urls(data)
Output:
['https:/example.com',
'https://example.com/path',
'https://example.com/abc/?q=u']
Upvotes: 2