user5430996
user5430996

Reputation:

Python 3 - split a list

What I've currently done is going to an API website, get it's data and read it. Now I've put it in a list by splitting the information:

Currently outcome:

[b'688284,332,2830336', b'661114,40,37229', b'978148,1,81', b'262250,69,736665', b'269715,68,605568', b'171278,73,1026179', b'1249503,1,15', b'246783,64,424574', b'-1,1,0', b'1826857,1,25', b'1515172,1,0', b'-1,1,0', b'-1,1,0', b'1655032,1,0', b'-1,1,0', b'-1,1,0', b'1453895,1,0', b'1520874,1,0', b'1561752,1,0', b'1508907,1,0', b'1416987,1,0', b'1437689,1,0', b'1421569,1,0', b'1391397,1,0', b'-1,-1', b'-1,-1', b'-1,-1', b'']

But what I need to do is in this list, split the raw data to clean data. For example in list[0] the output is:

b'688286,332,2830336'

I need the 3 numbers without any comma or b separated in another variable. How could I do that?

Upvotes: 0

Views: 4666

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1122012

Split on the comma (using b'..' byte string literals), then use int() to convert to integers, using list comprehensions to process all strings and values in each string:

[[int(num) for num in value.split(b',')] for value in yourlist if value]

The if value filter skips empty strings.

This produces nested lists; one per bytestring:

>>> yourlist = [b'688284,332,2830336', b'661114,40,37229', b'978148,1,81', b'262250,69,736665', b'269715,68,605568', b'171278,73,1026179', b'1249503,1,15', b'246783,64,424574', b'-1,1,0', b'1826857,1,25', b'1515172,1,0', b'-1,1,0', b'-1,1,0', b'1655032,1,0', b'-1,1,0', b'-1,1,0', b'1453895,1,0', b'1520874,1,0', b'1561752,1,0', b'1508907,1,0', b'1416987,1,0', b'1437689,1,0', b'1421569,1,0', b'1391397,1,0', b'-1,-1', b'-1,-1', b'-1,-1', b'']
>>> [[int(num) for num in value.split(b',')] for value in yourlist if value]
[[688284, 332, 2830336], [661114, 40, 37229], [978148, 1, 81], [262250, 69, 736665], [269715, 68, 605568], [171278, 73, 1026179], [1249503, 1, 15], [246783, 64, 424574], [-1, 1, 0], [1826857, 1, 25], [1515172, 1, 0], [-1, 1, 0], [-1, 1, 0], [1655032, 1, 0], [-1, 1, 0], [-1, 1, 0], [1453895, 1, 0], [1520874, 1, 0], [1561752, 1, 0], [1508907, 1, 0], [1416987, 1, 0], [1437689, 1, 0], [1421569, 1, 0], [1391397, 1, 0], [-1, -1], [-1, -1], [-1, -1]]

If you want a flat list, use just one list comprehension combining the loops:

[int(num) for value in yourlist if value for num in value.split(b',')]

However, it sounds like you are really parsing CSV values here, from a web URL. Decode the data to text and feed it to a csv.reader() object to handle the splitting:

import io
import csv

response = urllib.request.urlopen(url)
codec = response.info().get_param('charset', 'latin1')
reader = csv.reader(io.TextIOWrapper(response, encoding=codec))
for row in reader:
    row = [int(col) for col in row]
    # do something with each row

or read the response in one go (the urllib.request library seems to throw in a large delay when using anything but a straight-up .read() call for your sample URL):

response = urllib.request.urlopen(url)
codec = response.info().get_param('charset', 'latin1')
data = response.read().decode(codec)
reader = csv.reader(data.splitlines())
for row in reader:
    row = [int(col) for col in row]
    # do something with each row

The get_param() call queries if the server told us what codec to use to decode the response, falling back to ISO-8859-1 (Latin-1) as the default for HTTP text responses.

Upvotes: 2

Related Questions