Reputation: 31
I have an algorithmic question for you. No need for an application context, I'll give you a direct example.
Here is a possible input: input = [ 1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 4 ]. Let's assume a batch size of 5. The idea here is to output lists of maximum size 5 without having separate number, in short: 2 identical numbers cannot be in separate sub-lists. Example output: [ [1, 1, 1], [2, 2, 2, 2], [3, 3, 4, 4, 4] ]
Assumptions: Numbers always sorted, batch_size always larger than possible number of numbers
Do you have a more elegant solution than the one I just found?
i = 0
batch_size = 5
res = []
while i < len(input):
# Retrieve the data list according to the batch size
data = input[i: i + size]
# Increment the index
i += size
# See what's the next output looks like
future_data = input[i: i + size]
if future_data and future_data[0] == data[-1]:
# So we count how many times this number appears in our current list
# and subtract that from our index
cp = data.count(data[-1])
i -= cp
# Then remove from the current list all occurrence of that number
data = data[:-cp]
res.append(data)
Edit: according to @juanpa.arrivillaga's answer:
Thank you all for your reactivity and your answers.
I continue on episode 2, I gave you here my simplified problem and I thought your solution would be sufficient, despite your response, I do not see how to adapt the solution of @juanpa.arrivillaga to my data format, in fact the input would look more like :
input = {
'data_1' : {
'id': [1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 4],
'char': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L']
}
}
!the size of the lists in value of 'id' and 'char' are necessarily equal!
the output must look like:
[
[1, 'A', 1, 'B', 1, 'C'],
[2, 'D', 2, 'E', 2, 'F', 2, 'G'],
[3, 'H', 3, 'I', 4, 'J', 4, 'K', 4, 'L']
]
I am aware that the data structure is not optimal, unfortunately I don't have the hand on it and is therefore unchangeable...
Still the same constrains as before (batch size is working only on the id, am I clear enough ?)
Upvotes: 3
Views: 523
Reputation: 348
If you don't want to use any library(module), then this solution is for you.
batch_size = 5
id = list(map(int, input("Enter multiple ids in one line with space between them: ").split()))
char = list(map(str, input("Enter multiple char in one line with space between them: ").split()))
id_char_input = {char[i]: id[i] for i in range(len(id))}
id_unique = list(set([value for key, value in id_char_input.items()]))
group_input = [[] for _ in range(len(id_unique))]
output = []
j = 0
for key, value in id_char_input.items():
group_input[id_unique.index(value)].append(value)
group_input[id_unique.index(value)].append(key)
while j < len(group_input):
if j != len(group_input) - 1:
if len(group_input[j] + group_input[j + 1]) <= 2*batch_size:
output.append(group_input[j] + group_input[j + 1])
j += 2
else:
if group_input[j] not in output:
output.append(group_input[j])
j += 1
else:
output.append(group_input[j])
break
print(output)
Output:
Enter multiple ids in one line with space between them: 1 1 1 2 2 2 2 3 3 4 4 4
Enter multiple char in one line with space between them: A B C D E F G H I J K L
[[1, 'A', 1, 'B', 1, 'C'], [2, 'D', 2, 'E', 2, 'F', 2, 'G'], [3, 'H', 3, 'I', 4, 'J', 4, 'K', 4, 'L']]
EDIT: Now you can take values of id and char from the input.
Upvotes: 0
Reputation: 71471
You can use itertools.groupby
with a recursive generator function to find possible merges that meet your criteria. This way, you can better handle more ambiguous cases where it is unclear which "sibling" group should absorb the double pairing and/or truncated results where groups have a length greater than batch_size
:
from itertools import groupby
data = {'data_1': {'id': [1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 4], 'char': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L']}}
batch_size = 5
def get_groups(d, c = [], p = []):
if not d and not p and all((l:=len(i)) <= batch_size and i and l != 2 for i in c):
#found valid combo
yield c
elif d:
_p = (k:=(p+d[0]))[(b:=(l if (l:=len(k)) <= batch_size else -1*(l-batch_size))):]
if l == 2 and not _p:
#if group size is two, then we have to find possible merges for group
yield from get_groups(d[1:], c=c, p = k)
yield from get_groups([c[-1]+k]+d[1:], c=c[:-1], p = [])
elif _p:
#group size > batch_size, need to find possible siblings that can take subgroups of groups
for i in range(batch_size):
yield from get_groups(d[1:], c=c+[k[:i]], p = k[i:])
if c and len(c[-1]) + i <= batch_size:
yield from get_groups(d[1:], c=c[:-1]+[c[-1]+k[batch_size:batch_size+i]]+[k[:batch_size]], p = k[batch_size+i:])
yield from get_groups(d[1:], c=c+[k[:b]], p = _p)
elif p:
yield from get_groups(d[1:], c=c+[p], p = [])
combo = next(get_groups([list(b) for _, b in groupby(data['data_1']['id'])]))
c_m = iter(data['data_1']['char'])
result = [[i for j in [(x, next(c_m)) for x in y] for i in j] for y in combo]
Output:
[[1, 'A', 1, 'B', 1, 'C'], [2, 'D', 2, 'E', 2, 'F', 2, 'G'], [3, 'H', 3, 'I', 4, 'J', 4, 'K', 4, 'L']]
Upvotes: 1
Reputation: 96360
Here is how I would do this, in a single pass:
>>> import itertools
>>> batch_size = 5
>>> result = [[]]
>>> input_data = [ 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4 ]
>>> for _, g in itertools.groupby(input_data):
... current = list(g)
... if len(result[-1]) + len(current) <= batch_size:
... result[-1].extend(current)
... else:
... result.append(current)
...
>>> result
[[1, 1, 1], [2, 2, 2, 3, 3], [4, 4, 4]]
Let's break it down into intermediate steps to aid in understanding, first, this is what itertools.groupby
does eagerly evaluated:
>>> import itertools
>>> batch_size = 5
>>> grouped = [list(g) for k,g in itertools.groupby([ 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4 ])]
>>> grouped
[[1, 1, 1], [2, 2, 2], [3, 3], [4, 4, 4]]
Then, simply build up your result, which is a list of sublists. If it can fit into the current sublist, add the group to the sublist, otherwise, append a new sublist consisting of that group (which we can assume is not larger than batch_size):
>>> result = [[]]
>>> for group in grouped:
... if len(result[-1]) + len(group) <= batch_size:
... result[-1].extend(group)
... else:
... result.append(group[:])
...
>>> result
[[1, 1, 1], [2, 2, 2, 3, 3], [4, 4, 4]]
The above does two passes on the data, the first example posted does one pass.
Note, if using itertools.groupby
feels like "cheating", you can implement something that does the trick for this case relatively easy:
def simple_groupby(data):
it = iter(data)
empty = object()
current = next(it, empty)
if current is empty:
return
prev = current
acc = [current]
for current in it:
if prev == current:
acc.append(current)
else:
yield acc
acc = [current]
prev = current
yield acc
Upvotes: 4