Reputation: 1432
I was given the following Python 2x code. I went to convert it to Python 3x by changing import urllib2
to from urllib.request import urlopen
. I got rid of the urllib2 reference and ran the program. The document at the end of the url was retrieved, but the program failed at the line indicated, throwing the error
TypeError: a bytes-like object is required, not 'str'
The document looks like this: b'9306112 9210128 9202065 \r\n9306114 9204065 9301122 \r\n9306115 \r\n9306116 \r\n9306117 \r\n9306118 \r\n9306119
I tried playing with the return value at that line and the one above (e.g., converting to bytes, splitting on different values), but nothing worked. Any thoughts as to what is happening?
import urllib2
CITATION_URL = "http://storage.googleapis.com/codeskulptor-alg/alg_phys-cite.txt"
def load_graph(graph_url):
"""
Function that loads a graph given the URL
for a text representation of the graph
Returns a dictionary that models a graph
"""
graph_file = urllib2.urlopen(graph_url)
graph_text = graph_file.read()
graph_lines = graph_text.split('\n') <--- The Problem
graph_lines = graph_lines[ : -1]
print "Loaded graph with", len(graph_lines), "nodes"
answer_graph = {}
for line in graph_lines:
neighbors = line.split(' ')
node = int(neighbors[0])
answer_graph[node] = set([])
for neighbor in neighbors[1 : -1]:
answer_graph[node].add(int(neighbor))
return answer_graph
citation_graph = load_graph(CITATION_URL)
print(citation_graph)
Upvotes: 0
Views: 661
Reputation: 25799
You can only split likes with likes - if you want to split with \n
while still keeping graph_text
as bytes
, define the split as a bytes
sequence, too:
graph_lines = graph_text.split(b'\n')
Otherwise, if you know the codec your graph_text
data was encoded with, first decode it into a str
with: graph_text.decode("<codec>")
and then continue treating it as a str
.
Upvotes: 1
Reputation: 6426
In order to treat a bytes
object like a string, you need to decode it first. For example:
graph_text = graph_file.read().decode("utf-8")
if the encoding is UTF-8. This should allow you to treat this as a string instead of a sequence of bytes.
Upvotes: 1