Reputation: 4465
I'm reading URL's from a .csv file and I'm trying to parse them. Why do I only get correct values in scheme and netloc when I put the link explicit in the function urlparse(...)
, see variable o2
and not when I give newsource
in urlparse
?
for line in file:
source = str(line.split(",")[2])
print("ORIGINAL URL: \n" + source)
newsource = source.replace('"',"")
print("REMOVING QUOTES: \n" + newsource)
newsource.strip
print("STRIPPING SPACES: \n" + newsource + "\n")
o = urlparse(newsource)
print("RESULT PARSING: " + str(o) + "\n")
o2 = urlparse("http://nl.aldi.be/aldi_vlees_609.html")
print("RESULT MANUAL PARSING: " + str(o2) + "\n")
Upvotes: 0
Views: 37
Reputation: 362746
I can see from the failed parse that you have a leading space character, which would cause the same problem you have:
>>> urlparse.urlparse(' http://nl.aldi.be/aldi_vlees_609.html')
ParseResult(scheme='', netloc='', path=' http://nl.aldi.be/aldi_vlees_609.html', params='', query='', fragment='')
This line does nothing:
newsource.strip
You probably wanted:
newsource = newsource.strip()
Upvotes: 1