Hashmi
Hashmi

Reputation: 147

Deleting duplicate list items from a list of lists of lists

I have a list of lists of lists as follows:

All_Data= [[['Chemical Name', 'Average Ret. Time', 'Maximum Area'],
 ['1-hexanol', 5.10, 2544937.0], ['1-hexanol', 8.69, 3798101.0],
 ['1-hexanol', 5.54, 2470679.0], ['2-propanone-1-hydroxy-', 1.97, 227607.0], 
 ['acetic acid', 1.962, 227607.0], ['acetic acid', 1.75, 38359423.0], 
 ['acetoin', 2.32, 478054.0]],
[['Chemical Name', 'Average Ret. Time', 'Maximum Area'], ['1-pentanol', 3.00, 24864.0], 
 ['2-heptanone', 5.54, 10027158.0], ['2-pentanone', 2.10, 858204.0], 
 ['2-pentanone', 2.03, 858204.0], ['2-pentanone', 2.037, 858204.0], 
 ['2-pentanone', 1.97, 858204.0], ['pentane, 2,3,3-trimethyl-', 2.84, 1775913.0], 
 ['pentane, 2,3,4-trimethyl-', 2.75, 807020.0]],
[['Chemical Name', 'Average Ret. Time', 'Maximum Area'], ['.alpha.-pinene', 7.00, 8190.0], 
 ['.alpha.-pinene', 8.729, 21582890.0], ['ethyl hexanoate', 9.47, 71863418.0], 
 ['nonanal', 13.93, 10301295.0], ['pentanoic acid, ethyl ester', 5.88, 19659678.0],
 ['propanoic acid, ethyl ester', 2.30, 8107638.0]]]

So the list contains 3 levels: There are three main sublists of the list named "All_Data" and each sublist contains a set of data in further sublists. I want to compare the sublists within three main sublists independently and see if the first item of two sublists matches, I want to delete one sublist and just keep one. For example, in first data '1-hexanol' is repeated three times, I want to keep just one sublist:

['1-hexanol', 5.10, 2544937.0]

and delete the other two:

['1-hexanol', 8.69, 3798101.0], ['1-hexanol', 5.54, 2470679.0]

I tried the following code but it gives the error: "TypeError: 'int' object is not subscriptable".

Code:

for i in All_Data:
    for j in range(0, len(i)):
        for k in range(1, len(i)):
            if i[j[0]] == i[k[0]]:
                del i[k[0]]

Please help me on this.

Kind Regards, Ali

Upvotes: 1

Views: 75

Answers (2)

shizhz
shizhz

Reputation: 12501

While @Prune has provided detailed explanation about the error in your code, I'd like to provide alternative solutions to your problem.

Basically, you can define a function remove_duplicate for your 2nd level array to remove 3rd level arrays based on its first element, and then generate your final result with list comprehensive:

def remove_duplicate(sublist):    
    seen = set()
    return [e for e in sublist if not (e[0] in seen or seen.add(e[0]))]

result = [remove_duplicate(sublist) for sublist in All_Data]

Upvotes: 0

Prune
Prune

Reputation: 77837

The error message tells you the problem: you can't subscript an integer. j and k are integers.

if i[j[0]] == i[k[0]]:

Perhaps you meant to use them as the first index in a 2D expression:

if i[j][0] == i[k][0]:

UPDATE per OP's comment (second problem):

Ah, yes -- this is an old problem: you're shortening a list while you're still stepping through it. The code doesn't work the way you want: every time you delete a row, you change the indices of the later rows. First, you miss a row; second, your loop is trying to run through the original number of rows.

For instance, you start with 10 rows, with rows 3, 4, and 6 (of 0 - 9) having the same first element as row 1. With j=0, you run k from 0 through 9

When k reaches 3, you find the duplicate. You delete row 3 and move to row 4 ... except that the original row 4 is now row 3, and the row you're now looking at is the original row 5. You pass that, go to row 6, and delete that one as well. You continue to row 7, pass it, and then row 8 ...

Except that there is no row 8 remaining in the list. Your loop depends on a generator that runs through 9: that doesn't change as you alter the list. k is now out of range.

REPAIR The general solution is to mark rows for later deletion as you find them. When you leave the main marking loop, make a second pass to remove anything marked for deletion. Again, be careful not to skip rows: either work backwards, or use a while loop and increment the index only when you keep the row.

Upvotes: 1

Related Questions