Reputation: 43
I have a nested list that I need to chain, then run metrics, then "unchain" back into its original nested format. Here is example data to illustrate:
from itertools import chain
nested_list = [['x', 'xx', 'xxx'], ['yy', 'yyy', 'y', 'yyyy'], ['zz', 'z']]
chained_list = list(chain(*nested_list))
print("chained_list: \n", chained_list)
metrics_list = [str(chained_list[x]) +'_score' \
for x in range(len(chained_list))]
print("metrics_list: \n", metrics_list)
zipped_scores = list(zip(chained_list, metrics_list))
print("zipped_scores: \n", zipped_scores)
unchain_function = '????'
chained_list:
['x', 'xx', 'xxx', 'yy', 'yyy', 'y', 'yyyy', 'zz', 'z']
metrics_list:
['x_score', 'xx_score', 'xxx_score', 'yy_score', 'yyy_score', 'y_score', 'yyyy_score', 'zz_score', 'z_score']
zipped_scores:
[('x', 'x_score'), ('xx', 'xx_score'), ('xxx', 'xxx_score'), ('yy', 'yy_score'), ('yyy', 'yyy_score'), ('y', 'y_score'), ('yyyy', 'yyyy_score'), ('zz', 'zz_score'), ('z', 'z_score')]
Is there a python function or pythonic way to write an "unchain_function" to get this DESIRED OUTPUT?
[
[
('x', 'x_score'),
('xx', 'xx_score'),
('xxx', 'xxx_score')
],
[
('yy', 'yy_score'),
('yyy', 'yyy_score'),
('y', 'y_score'),
('yyyy', 'yyyy_score')
],
[
('zz', 'zz_score'),
('z', 'z_score')
]
]
(background: this is for running metrics on lists having lengths greater than 100,000)
Upvotes: 3
Views: 582
Reputation: 1810
Here's a simple way to get the desired output.
nested_list = [['x', 'xx', 'xxx'], ['yy', 'yyy', 'y', 'yyyy'], ['zz', 'z']]
zipped_scores =
[('x', 'x_score'), ('xx', 'xx_score'), ('xxx', 'xxx_score'), ('yy', 'yy_score'), ('yyy', 'yyy_score'), ('y', 'y_score'), ('yyyy', 'yyyy_score'), ('zz', 'zz_score'), ('z', 'z_score')]
zipped_scores_iter = iter(zipped_scores)
unchained_list = [[next(zipped_scores_iter) for x in sublist] for sublist in nested_list]
Notice: with the following list comprehension, we could replicate nested_list
exactly:
[[x for x in sublist] for sublist in nested_list]
We have the structure. All we want to do is swap the original x
for the new value:
[[corresponding_value_for(x) for x in sublist] for sublist in nested_list]
I think the accepted answer takes the same approach, but uses a more complicated method of getting the corresponding value.
There's already a one-to-one correspondence between the input (nested_list
) and desired values (zipped_scores
), given by their order. Therefore, we can replace x
with the corresponding element from zipped_scores
by pulling the next item from an iterator.
[[next(zipped_scores_iter) for x in sublist] for sublist in nested_list]
By the way, while in this case it doesn't seem like flattening the list is needed to get the desired output, I've encountered a similar problem where flattening and then re-grouping was useful (sending a batch of inputs to an external process). This was my approach.
Upvotes: 0
Reputation: 162
You have made algorithm very complex you can just do it by simple steps shown below:
First create a empty nested list of desired size
formatted_list = [[] for _ in range(3)]
Just loop over the list and format accordingly
for K in range(0,3):
for i in nested_list[K]:
formatted_list[K].append(i + '_score')
print([formatted_list])
Upvotes: 0
Reputation: 44525
I think you just want to group your data according to some condition, i.e. the first letter of the first index in each tuple.
Given
Your flattened, zipped data:
data = [
('x', 'x_score'), ('xx', 'xx_score'), ('xxx', 'xxx_score'),
('yy', 'yy_score'), ('yyy', 'yyy_score'), ('y', 'y_score'), ('yyyy', 'yyyy_score'),
('zz', 'zz_score'), ('z', 'z_score')
]
Code
[list(g) for _, g in itertools.groupby(data, key=lambda x: x[0][0])]
Output
[[('x', 'x_score'), ('xx', 'xx_score'), ('xxx', 'xxx_score')],
[('yy', 'yy_score'),
('yyy', 'yyy_score'),
('y', 'y_score'),
('yyyy', 'yyyy_score')],
[('zz', 'zz_score'), ('z', 'z_score')]]
See Also
Upvotes: 0
Reputation: 24691
I dunno about how pythonic this is, but this should work. Long story short, we're using a Wrapper
class to turn an immutable primitive (which is impossible to change without replacing) into a mutable variable (so we can have multiple references to the same variable, each organized differently).
We create an identical nested list except that each value is a Wrapper
of the corresponding value from the original list. Then, we apply the same transformation to unchain the wrapper list. Copy changes from the processed chained list onto the chained wrapper list, and then access those changes from the nested wrapper list and unwrap them.
I think that using an explicit and simple class called Wrapper
is easier to understand, but you could do essentially the same thing by using a singleton list to contain the variable instead of an instance of Wrapper
.
from itertools import chain
nested_list = [['x', 'xx', 'xxx'], ['yy', 'yyy', 'y', 'yyyy'], ['zz', 'z']]
chained_list = list(chain(*nested_list))
metrics_list = [str(chained_list[x]) +'_score' for x in range(len(chained_list))]
zipped_scores = list(zip(chained_list, metrics_list))
# create a simple Wrapper class, so we can essentially have a mutable primitive.
# We can put the Wrapper into two different lists, and modify its value without
# overwriting it.
class Wrapper:
def __init__(self, value):
self.value = value
# create a 'duplicate list' of the nested and chained lists, respectively,
# such that each element of these lists is a Wrapper of the corresponding
# element in the above lists
nested_wrappers = [[Wrapper(elem) for elem in sublist] for sublist in nested_list]
chained_wrappers = list(chain(*nested_wrappers))
# now we have two references to the same MUTABLE Wrapper for each element of
# the original lists - one nested, and one chained. If we change a property
# of the chained Wrapper, the change will reflect on the corresponding nested
# Wrapper. Copy the changes from the zipped scores onto the chained wrappers
for score, wrapper in zip(zipped_scores, chained_wrappers):
wrapper.value = score
# then extract the values in the unchained list of the same wrappers, thus
# preserving both the changes and the original nested organization
unchained_list = [[wrapper.value for wrapper in sublist] for sublist in nested_wrappers]
This ends with unchained_list
equal to the following:
[[('x', 'x_score'), ('xx', 'xx_score'), ('xxx', 'xxx_score')], [('yy', 'yy_score'), ('yyy', 'yyy_score'), ('y', 'y_score'), ('yyyy', 'yyyy_score')], [('zz', 'zz_score'), ('z', 'z_score')]]
Upvotes: 1