Find Hyperlinks in Text using Python (Follow-up to another post)

Question

In regards to (Extracting a URL in Python) I have a follow-up question. Note: I'm new to SO and Python, so feel free to correct me on etiquette.

I pulled the regex from the above post and this works fine for me:

myString = """  """
print re.search("(?Phttps?://[^\s]+)", myString).group("url")

However what I really need to do is loop through a data set that I have previously retrieved from a database. So I did the below, which gives me a strange error, also below.

# Note: "data" here is actually a list of strings, not a data set     
for pseudo_url in data:
        print re.search("(?Phttps?://[^\s]+)", str(pseudo_url)).group("url")

Error:

Traceback (most recent call last):
  File "find_and_email_bad_press_urls.py", line 136, in 
    main()
  File "find_and_email_bad_press_urls.py", line 14, in main
    scrubbed_urls = extract_urls_from_raw_data(raw_url_data)
  File "find_and_email_bad_press_urls.py", line 47, in extract_urls_from_raw_data
    print re.search("(?Phttps?://[^\s]+)", str(pseudo_url)).group("url")
AttributeError: 'NoneType' object has no attribute 'group'

When I Google this I find tons of irrelevant posts, so I was hoping SO could shed some light. My hunch is that the regex is blowing up on some null data, special character, etc., but I don't know enough about Python to figure it out. Casting to a string didn't help either.

Any ideas or workarounds to power through this would be much appreciated!

Find Hyperlinks in Text using Python (Follow-up to another post)

Answers (1)

Related Questions