Reputation: 424
So I have a dataset and I am trying to split the data into 4 lists. I have written this function to do just that.
def __spit_list_into_group(self, IDList, n, list1, list2, list3, list4):
newlist = [IDList[i:i + n] for i in xrange(0, len(IDList), n)]
list1, list2, list3, list4 = map(list, zip(*newlist))
return list1, list2, list3, list4
However, when I set n
to be 4, the code is only able to split the data into 3 lists and when I set n
to be 5, the code splits it to 5 lists. Why doesn't the code split the data into 4 and how can i get it to split the data into 4?
EDIT: I realized that this dataset has 15 data points which is why I am only able to split it into 3 and 5. How do I split the data into 4, not necessarily, equal groups? I need to write something that is flexible as I want the same code to work on other data sets which may have more or less data points.
Upvotes: 0
Views: 65
Reputation: 2125
This problem can be simplified by using index slicing.
If you want to create n
lists of roughly equal size you can do this:
def split_list(input_list, n):
output_lists = [input_list[i::n] for i in range(n)]
return output_lists
This steps through your input list in jumps of n
to give the required number of output lists.
For example, say your input list is range(15)
, or [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14]
and you want n=4
lists.
This will return [[0,4,8,12],[1,5,9,13],[2,6,10,14],[3,7,11]]
.
Alternatively, if you want to group the input list into n
output lists, preserving the original order, you could do:
import math
def split_list(input_list, n):
group_size = int(math.ceil(len(input_list)/float(n)))
output_lists = [input_list[group_size*i:min((i+1)*group_size, len(input_list))] for i in range(n)]
return output_lists
Using the same example as above, this will return [[0,1,2,3],[4,5,6,7],[8,9,10,11],[12,13,14]]
.
Upvotes: 2
Reputation: 107115
This is because the length of your IDList
list is not divisable by 4
, leaving a sub-list in newlist
with 3 items only, and when you zip
the sub-lists, the zipping stops when any one of the input iterators is exhausted, resulting in only 3 lists instead of 4. You can replace zip
with itertools.izip_longest
instead (after importing itertools
) so the output would be as expected.
Upvotes: 1