Reputation:
What I've currently done is going to an API website, get it's data and read it. Now I've put it in a list by splitting the information:
Currently outcome:
[b'688284,332,2830336', b'661114,40,37229', b'978148,1,81', b'262250,69,736665', b'269715,68,605568', b'171278,73,1026179', b'1249503,1,15', b'246783,64,424574', b'-1,1,0', b'1826857,1,25', b'1515172,1,0', b'-1,1,0', b'-1,1,0', b'1655032,1,0', b'-1,1,0', b'-1,1,0', b'1453895,1,0', b'1520874,1,0', b'1561752,1,0', b'1508907,1,0', b'1416987,1,0', b'1437689,1,0', b'1421569,1,0', b'1391397,1,0', b'-1,-1', b'-1,-1', b'-1,-1', b'']
But what I need to do is in this list, split the raw data to clean data. For example in list[0] the output is:
b'688286,332,2830336'
I need the 3 numbers without any comma or b separated in another variable. How could I do that?
Upvotes: 0
Views: 4666
Reputation: 1122012
Split on the comma (using b'..'
byte string literals), then use int()
to convert to integers, using list comprehensions to process all strings and values in each string:
[[int(num) for num in value.split(b',')] for value in yourlist if value]
The if value
filter skips empty strings.
This produces nested lists; one per bytestring:
>>> yourlist = [b'688284,332,2830336', b'661114,40,37229', b'978148,1,81', b'262250,69,736665', b'269715,68,605568', b'171278,73,1026179', b'1249503,1,15', b'246783,64,424574', b'-1,1,0', b'1826857,1,25', b'1515172,1,0', b'-1,1,0', b'-1,1,0', b'1655032,1,0', b'-1,1,0', b'-1,1,0', b'1453895,1,0', b'1520874,1,0', b'1561752,1,0', b'1508907,1,0', b'1416987,1,0', b'1437689,1,0', b'1421569,1,0', b'1391397,1,0', b'-1,-1', b'-1,-1', b'-1,-1', b'']
>>> [[int(num) for num in value.split(b',')] for value in yourlist if value]
[[688284, 332, 2830336], [661114, 40, 37229], [978148, 1, 81], [262250, 69, 736665], [269715, 68, 605568], [171278, 73, 1026179], [1249503, 1, 15], [246783, 64, 424574], [-1, 1, 0], [1826857, 1, 25], [1515172, 1, 0], [-1, 1, 0], [-1, 1, 0], [1655032, 1, 0], [-1, 1, 0], [-1, 1, 0], [1453895, 1, 0], [1520874, 1, 0], [1561752, 1, 0], [1508907, 1, 0], [1416987, 1, 0], [1437689, 1, 0], [1421569, 1, 0], [1391397, 1, 0], [-1, -1], [-1, -1], [-1, -1]]
If you want a flat list, use just one list comprehension combining the loops:
[int(num) for value in yourlist if value for num in value.split(b',')]
However, it sounds like you are really parsing CSV values here, from a web URL. Decode the data to text and feed it to a csv.reader()
object to handle the splitting:
import io
import csv
response = urllib.request.urlopen(url)
codec = response.info().get_param('charset', 'latin1')
reader = csv.reader(io.TextIOWrapper(response, encoding=codec))
for row in reader:
row = [int(col) for col in row]
# do something with each row
or read the response in one go (the urllib.request
library seems to throw in a large delay when using anything but a straight-up .read()
call for your sample URL):
response = urllib.request.urlopen(url)
codec = response.info().get_param('charset', 'latin1')
data = response.read().decode(codec)
reader = csv.reader(data.splitlines())
for row in reader:
row = [int(col) for col in row]
# do something with each row
The get_param()
call queries if the server told us what codec to use to decode the response, falling back to ISO-8859-1 (Latin-1) as the default for HTTP text responses.
Upvotes: 2