How to filter a 2 level list based on values within sublists?

Question

Think that the following list is a table, where sublist[0] contains the column headers.

data = [
            ['S1', 'S2 ', 'ELEMENT', 'C1', 'C2'], 
            ['X' ,  'X' , 'GRT'    ,  1,    4  ], 
            [''  ,  'X' , 'OIP'    ,  3,    2  ], 
            [''  ,  'X' , 'LKJ'    ,  2,    7  ], 
            ['X' ,  ''  , 'UBC'    ,  1,    0  ]
        ]

I'm trying to filter the list based on the values in "column S1" and "column S2".

I want to get:

a new list "S1" containing the sublists that has an "X" in "column S1"
a new list "S2" containing the sublists that has an "X" in "column S2"

Like this:

S1 = [
            ['ELEMENT', 'C1', 'C2'], 
            ['GRT',      1,    4  ], 
            ['UBC',      1,    0  ]
        ]       

S2 = [
            ['ELEMENT', 'C1', 'C2'], 
            ['GRT',      1,    4  ], 
            ['OIP',      3,    2  ], 
            ['LKJ',      2,    7  ]
        ]

Below I show the code I have so far, where I make a copy of source list data an then check which sublist doesn't have "X" in "column S1". I get correct content in new list S1, but I don't know why the source list data is being modified and I cannot use it to get new list S2.

S1 = data
for sublist in S1[1:]:
    if sublist[0] != "X":
            s1.remove(sublist)

s2 = data
for sublist in S2[1:]:
    if sublist[1] != "X":
            s2.remove(sublist)


>>> data
[['S1', 'S2 ', 'ELEMENT', 'C1', 'C2'], ['X', 'X', 'GRT', 1, 4], ['X', '', 'UBC', 1, 0]]
>>> S1
[['S1', 'S2 ', 'ELEMENT', 'C1', 'C2'], ['X', 'X', 'GRT', 1, 4], ['X', '', 'UBC', 1, 0]]
>>>

How would be a better way to get lists S1 and S2? Thanks.

L3viathan · Accepted Answer

Your problem is because simply assigning the list to a new name does not make a copy.

You might be able to make your solution work by doing

S1 = data[:]  # slicing makes a copy
S2 = data[:]

instead.

Here's a generic solution:

def split_from_columns(ls, i_columns=(), indicator='X'):
    for i in i_columns:
        yield [
            [v for k, v in enumerate(sl) if k not in i_columns]
            for j, sl in enumerate(ls)
            if j == 0 or sl[i] == indicator
        ]

Usage:

>>> S1, S2 = split_from_columns(data, i_columns=(0, 1))
>>> S1
[['ELEMENT', 'C1', 'C2'], ['GRT', 1, 4], ['UBC', 1, 0]]
>>> S2
[['ELEMENT', 'C1', 'C2'], ['GRT', 1, 4], ['OIP', 3, 2], ['LKJ', 2, 7]]

The if j == 0 part makes sure we always copy the header. You can change i_columns to adjust where the indicator columns are.

How to filter a 2 level list based on values within sublists?

Answers (1)

Related Questions