Stanko
Stanko

Reputation: 4465

URL parsing only working explicit

I'm reading URL's from a .csv file and I'm trying to parse them. Why do I only get correct values in scheme and netloc when I put the link explicit in the function urlparse(...), see variable o2 and not when I give newsource in urlparse?

for line in file:
    source = str(line.split(",")[2])
    print("ORIGINAL URL: \n" + source)
    newsource = source.replace('"',"")
    print("REMOVING QUOTES: \n" + newsource)
    newsource.strip
    print("STRIPPING SPACES: \n" + newsource + "\n")
    o = urlparse(newsource)
    print("RESULT PARSING: " + str(o) + "\n")
    o2 = urlparse("http://nl.aldi.be/aldi_vlees_609.html")
    print("RESULT MANUAL PARSING: " + str(o2) + "\n")

Output: enter image description here

Upvotes: 0

Views: 37

Answers (1)

wim
wim

Reputation: 362746

I can see from the failed parse that you have a leading space character, which would cause the same problem you have:

>>> urlparse.urlparse(' http://nl.aldi.be/aldi_vlees_609.html')
ParseResult(scheme='', netloc='', path=' http://nl.aldi.be/aldi_vlees_609.html', params='', query='', fragment='')

This line does nothing:

newsource.strip

You probably wanted:

newsource = newsource.strip()

Upvotes: 1

Related Questions