Reputation: 53
I've organized my data into 3 lists. The first one simply contains floating-point numbers, some of which are duplicates. The second and third lists contain 1D arrays of variable length.
The first list is sorted and all lists contain the same number of elements.
The overall format is this:
a = [1.0, 1.5, 1.5, 2 , 2]
b = [arr([1 2 3 4 10]), arr([4 8 10 11 5 6 12]), arr([1 5 7]), arr([70 1 2]), arr([1])]
c = [arr([3 4 8]), arr([5 6 12]), arr([6 7 10 123 14]), arr([70 1 2]), arr([1 5 10 4])]
I'm trying to find a way to merge the arrays in lists b
and c
if their corresponding float number is the same in the list a
. For the example above, the desired result would be:
a = [1.0, 1.5, 2]
b = [arr([1 2 3 4 10]), arr([4 8 10 11 5 6 12 1 5 7]), arr([70 1 2 1])]
c = [arr([3 4 8]), arr([5 6 12 6 7 10 123 14]), arr([70 1 2 1 5 10 4]])]
How would I go about doing this? Does it have something to do with zip?
Upvotes: 2
Views: 1396
Reputation: 26039
Since a
is sorted, I would use itertools.groupby
. Similar to @MadPhysicist's answer, but iterating over the zip
of lists:
import numpy as np
from itertools import groupby
arr = np.array
a = [1.0, 1.5, 1.5, 2 , 2]
b = [arr([1, 2, 3, 4, 10]), arr([4, 8, 10, 11, 5, 6, 12]), arr([1, 5, 7]), arr([70, 1, 2]), arr([1])]
c = [arr([3, 4, 8]), arr([5, 6, 12]), arr([6, 7, 10, 123, 14]), arr([70, 1, 2]), arr([1, 5, 10, 4])]
res_a, res_b, res_c = [], [], []
for k, g in groupby(zip(a, b, c), key=lambda x: x[0]):
g = list(g)
res_a.append(k)
res_b.append(np.concatenate([x[1] for x in g]))
res_c.append(np.concatenate([x[2] for x in g]))
..which outputs res_a
, res_b
and res_c
as:
[1.0, 1.5, 2]
[array([ 1, 2, 3, 4, 10]), array([ 4, 8, 10, 11, 5, 6, 12, 1, 5, 7]), array([70, 1, 2, 1])]
[array([3, 4, 8]), array([ 5, 6, 12, 6, 7, 10, 123, 14]), array([70, 1, 2, 1, 5, 10, 4])]
Alternatively in case a
is not sorted, you can use defaultdict
:
import numpy as np
from collections import defaultdict
arr = np.array
a = [1.0, 1.5, 1.5, 2 , 2]
b = [arr([1, 2, 3, 4, 10]), arr([4, 8, 10, 11, 5, 6, 12]), arr([1, 5, 7]), arr([70, 1, 2]), arr([1])]
c = [arr([3, 4, 8]), arr([5, 6, 12]), arr([6, 7, 10, 123, 14]), arr([70, 1, 2]), arr([1, 5, 10, 4])]
res_a, res_b, res_c = [], [], []
d = defaultdict(list)
for x, y, z in zip(a, b, c):
d[x].append([y, z])
for k, v in d.items():
res_a.append(k)
res_b.append(np.concatenate([x[0] for x in v]))
res_c.append(np.concatenate([x[1] for x in v]))
Upvotes: 3
Reputation: 476
EDIT: solutions above from @Austin and @Mad Physicist are better, so it's better to use them. Mine is reinventing bicycle which is not pythonic way.
I think that modifying original arrays is dangerous despite this approach using twice as much memory, but it's safe to iterate and do operations this way. What's happening:
a
and search for index occurencies in rest of a (we
exclude current value by remove(i)
b
and c
as usuala1
, b1
and c1
. Block value so that duplicate value won't trigger another
merge. Using if in the beginning we can check if value is blockednp.where
since it is a bit faster than using list comprehensions. Feel free to edit data formats etc, mine are simple for demonstration purposes.import numpy as np
a = [1.0, 1.5, 1.5, 2, 2]
b = [[1, 2, 3, 4, 10], [4, 8, 10, 11, 5, 6, 12], [1, 5, 7], [70, 1, 2], [1]]
c = [[3, 4, 8], [5, 6, 12], [6, 7, 10, 123, 14], [70, 1, 2], [1, 5, 10, 4]]
def function(list1, list2, list3):
a1 = []
b1 = []
c1 = []
merged_list = []
# to preserve original index we use enumerate
for i, item in enumerate(list1):
# to aboid merging twice we just exclude values from a we already checked
if item not in merged_list:
list_without_elem = np.array(list1)
ixs = np.where(list_without_elem == item)[0].tolist() # removing our original index
ixs.remove(i)
# if empty append to new list as usual since we don't need merge
if not ixs:
a1.append(item)
b1.append(list2[i])
c1.append(list3[i])
merged_list.append(item)
else:
temp1 = [*list2[i]] # temp b and c prefilled with first b and c
temp2 = [*list3[i]]
for ix in ixs:
[temp1.append(item) for item in list2[ix]]
[temp2.append(item) for item in list3[ix]]
a1.append(item)
b1.append(temp1)
c1.append(temp2)
merged_list.append(item)
print(a1)
print(b1)
print(c1)
# example output
# [1.0, 1.5, 2]
# [[1, 2, 3, 4, 10], [4, 8, 10, 11, 5, 6, 12, 1, 5, 7], [70, 1, 2, 1]]
# [[3, 4, 8], [5, 6, 12, 6, 7, 10, 123, 14], [70, 1, 2, 1, 5, 10, 4]]
Upvotes: 1
Reputation: 114330
Since a
is sorted, you could use itertools.groupby
on the range of indices in your list, keyed by a
:
from itertools import groupby
result_a = []
result_b = []
result_c = []
for _, group in groupby(range(len(a)), key=a.__getitem__):
group = list(group)
index = slice(group[0], group[-1] + 1)
result_a.append(k)
result_b.append(np.concatenate(b[index]))
result_c.append(np.concatenate(c[index]))
group
is an iterator, so you need to consume it to get the actual indices it represents. Each group
contains all the indices that correspond to the same value in list_a
.
slice(...)
is what gets passed to list.__getitem__
any time there is a :
in the indexing expression. index
is equivalent to group[0]:group[-1] + 1]
. This slices out the portion of the list that corresponds to each key in list_a
.
Finally, np.concatenate
just merges your arrays together in batches.
If you wanted to do this without doing list(group)
, you could consume the iterator in other ways, without keeping the values around. For example, you could get groupby
to do it for you:
from itertools import groupby
result_a = []
result_b = []
result_c = []
prev = None
for _, group in groupby(range(len(a)), key=a.__getitem__):
index = next(group)
result_a.append(k)
if prev is not None:
result_b.append(np.concatenate(b[prev:index]))
result_c.append(np.concatenate(c[prev:index]))
prev = index
if prev is not None:
result_b.append(np.concatenate(b[prev:]))
result_c.append(np.concatenate(c[prev:]))
At that point, you wouldn't even really need to use groupby
since it wouldn't be much more work to keep track of everything yourself:
result_a = []
result_b = []
result_c = []
k = None
for i, n in enumerate(a):
if n == k:
continue
result_a.append(n)
if k is not None:
result_b.append(np.concatenate(b[prev:i]))
result_c.append(np.concatenate(c[prev:i]))
k = n
prev = index
if k is not None:
result_b.append(np.concatenate(b[prev:]))
result_c.append(np.concatenate(c[prev:]))
Upvotes: 1