okoboko
okoboko

Reputation: 4482

python: flatten list while preserving nested structure for certain indexes

I found several posts about flattening/collapsing lists in Python, but none which cover this case:

Input:

[a_key_1, a_key_2, a_value_1, a_value_2]
[b_key_1, b_key_2, b_value_1, b_value_2]
[a_key_1, a_key_2 a_value_3, a_value_4]
[a_key_1, a_key_3, a_value_5, a_value_6]

Output:

[a_key_1, a_key_2, [a_value1, a_value3], [a_value_2, a_value_4]]
[b_key_1, b_key_2, [b_value1], [b_value_2]]
[a_key_1, a_key_3, [a_value_5], [a_value_6]]

I want to flatten the lists so there is only one entry per unique set of keys and the remaining values are combined into nested lists next to those unique keys.

EDIT: The first two elements in the input will always be the keys; the last two elements will always be the values.

Is this possible?

Upvotes: 0

Views: 160

Answers (2)

Amadan
Amadan

Reputation: 198334

data = [
    ["a_key_1", "a_key_2", "a_value_1", "a_value_2"],
    ["b_key_1", "b_key_2", "b_value_1", "b_value_2"],
    ["a_key_1", "a_key_2", "a_value_3", "a_value_4"],
    ["a_key_1", "a_key_3", "a_value_5", "a_value_6"],
]

from itertools import groupby
keyfunc = lambda row: (row[0], row[1])
print [
    list(key) + [list(zipped) for zipped in zip(*group)[2:]]
    for key, group
    in groupby(sorted(data, key=keyfunc), keyfunc)
]


# => [['a_key_1', 'a_key_2', ['a_value_1', 'a_value_3'], ['a_value_2', 'a_value_4']],
#     ['a_key_1', 'a_key_3', ['a_value_5'], ['a_value_6']],
#     ['b_key_1', 'b_key_2', ['b_value_1'], ['b_value_2']]]

For more information check the Python Docs

Upvotes: 1

Tyson
Tyson

Reputation: 424

Yes, it's possible. Here's a function (with doctest from your input/output) that performs the task:

#!/usr/bin/env python
"""Flatten lists as per http://stackoverflow.com/q/30387083/253599."""

from collections import OrderedDict


def flatten(key_length, *args):
    """
    Take lists having key elements and collect remainder into result.

    >>> flatten(1,
    ...         ['A', 'a1', 'a2'],
    ...         ['B', 'b1', 'b2'],
    ...         ['A', 'a3', 'a4'])
    [['A', ['a1', 'a2'], ['a3', 'a4']], ['B', ['b1', 'b2']]]

    >>> flatten(2,
    ...         ['A1', 'A2', 'a1', 'a2'],
    ...         ['B1', 'B2', 'b1', 'b2'],
    ...         ['A1', 'A2', 'a3', 'a4'],
    ...         ['A1', 'A3', 'a5', 'a6'])
    [['A1', 'A2', ['a1', 'a2'], ['a3', 'a4']], ['B1', 'B2', ['b1', 'b2']], ['A1', 'A3', ['a5', 'a6']]]
    """
    result = OrderedDict()
    for vals in args:
        result.setdefault(
            tuple(vals[:key_length]), [],
        ).append(vals[key_length:])
    return [
        list(key) + list(vals)
        for key, vals
        in result.items()
    ]


if __name__ == '__main__':
    import doctest
    doctest.testmod()

(Edited to work with both your original question and the edited question)

Upvotes: 3

Related Questions