Nucklear
Nucklear

Reputation: 478

Compare two lists in python and print the output

Hi I have a list of lists and I need to compare a value of each list with another one extracted from an XML file. The structure is similar to this:

[('example', '123', 'foo', 'bar'), ('example2', '456', 'foo', 'bar'), ...]

I need to compare the second value of each list with the values in the XML:

for item in main_list:
    for child in xml_data:
        if item[4] == child.get('value'):
            print item[4]

The problem is that the main_list has a huge ammount of lines (1000+) and this multiplied by the values from the xml (100+) results in a lot of iterations becoming this method unefficient.

Is there a way to do this efficiently?

Regards.

Upvotes: 1

Views: 208

Answers (1)

Gareth Latty
Gareth Latty

Reputation: 89077

A membership check on a set will be significantly faster than manually iterating and checking:

children = {child.get('value') for child in xml_data}
for item in main_list:
    if item[4] in children:
        print(item[4])

Here we construct the set with a simple set comprehension.

Note that it may be worth swapping what data is in the set - if main_list is longer, it will be more efficient to make the set of that data.

items = {item[4] for item in main_list}
for child in xml_data:
    value = child.get('value')
    if value in items:
        print(value)

These both also only do the processing on the data once, rather than each time a check is made.

Note that a set will not handle duplicate values or order on the set side - if that is important, this isn't a valid solution. This version will only use the order/duplicates from the data you are iterating over. If that isn't valid, then you can still process the data beforehand, and use itertools.product() to iterate a little quicker.

items = [item[4] for item in main_list]
children = [child.get('value') for child in xml_data]

for item, child in itertools.product(items, children):
    if item == child:
        print(item)

As Karl Knechtel points out, if you really don't care about order to duplicates at all, you can just do a set intersection:

for item in ({child.get('value') for child in xml_data} &
             {item[4] for item in main_list}):
    print(item)

Upvotes: 6

Related Questions