Zaif Senpai
Zaif Senpai

Reputation: 91

To make this methods faster

I have written this function in python 3 to merge 2 xml files.

The merging is being done at first level so it does not need to call itself recursively. The problem is that it is taking a lot of time because the xml files are large. Please help me optimize this code. Thanks

This is the function:

def combine_element(one, other):
    channel_ids = []
    programs_startstop = []

    for el in one:
        if el.tag == 'channel':
            channel_ids.append(el.get('id'))
        elif el.tag == 'programme':
            programs_startstop.append((el.get('start'), el.get('stop')))

    i = 0
    printProgressBar(i, len(other), prefix = 'Progress:', suffix = 'Complete', length = 50)
    for el in other:
        if el.tag == 'channel':
            if not el.get('id') in channel_ids:
                one.append(el)
                channel_ids.append(el.get('id'))
        elif el.tag == 'programme':
            if not (el.get('start'), el.get('stop')) in programs_startstop:
                one.append(el)
                programs_startstop.append((el.get('start'), el.get('stop')))
        i += 1
        printProgressBar(i, len(other), prefix = 'Progress:', suffix = 'Complete', length = 50)

This is an example of xml files to merge:

First file:

<tv>
 <channel id="C1">
  <display-name lang="en">C1</display-name>
 </channel>
 <channel id="C2">
  <display-name lang="en">C2</display-name>
 </channel>
 <programme channel="C1" start="20190607040000 +0000" stop="20190607043000 +0000">
  <title lang="en">P1</title>
  <desc lang="en">Program 1</desc>
 </programme>
 <programme channel="C2" start="20190707040000 +0000" stop="20190707043000 +0000">
  <title lang="en">P2</title>
  <desc lang="en">Program 2</desc>
 </programme>
</tv>

Second file:

<tv>
 <channel id="C3">
  <display-name lang="en">C3</display-name>
 </channel>
 <channel id="C4">
  <display-name lang="en">C4</display-name>
 </channel>
 <programme channel="C3" start="20190607070000 +0000" stop="20190607073000 +0000">
  <title lang="en">P3</title>
  <desc lang="en">Program 3</desc>
 </programme>
 <programme channel="C4" start="20190707050000 +0000" stop="20190707063000 +0000">
  <title lang="en">P4</title>
  <desc lang="en">Program 2</desc>
 </programme>
</tv>

The code is supposed to ignore the element in second file is it has same id and ignore the programme in second file if it has same start and stop times in the first file. The xml code given here is an example because I can not share actual data.

This is the expected outcome of the method, but faster:

<tv>
<channel id="C1">
  <display-name lang="en">C1</display-name>
 </channel>
 <channel id="C2">
  <display-name lang="en">C2</display-name>
 </channel>
<programme channel="C1" start="20190607040000 +0000" stop="20190607043000 +0000">
  <title lang="en">P1</title>
  <desc lang="en">Program 1</desc>
 </programme>
 <programme channel="C2" start="20190707040000 +0000" stop="20190707043000 +0000">
  <title lang="en">P2</title>
  <desc lang="en">Program 2</desc>
 </programme>
 <channel id="C3">
  <display-name lang="en">C3</display-name>
 </channel>
 <channel id="C4">
  <display-name lang="en">C4</display-name>
 </channel>
 <programme channel="C3" start="20190607070000 +0000" stop="20190607073000 +0000">
  <title lang="en">P3</title>
  <desc lang="en">Program 3</desc>
 </programme>
 <programme channel="C4" start="20190707050000 +0000" stop="20190707063000 +0000">
  <title lang="en">P4</title>
  <desc lang="en">Program 2</desc>
 </programme>
</tv>

Upvotes: 0

Views: 50

Answers (1)

Oluwafemi Sule
Oluwafemi Sule

Reputation: 38982

You should extract where you retrieve elements to a generator function that yields a key-value tuple pair.

Create dictionaries from the results of invoking the generator function on both arguments and merge the dictionary.

def elements(lst):
    for el in lst:
        if el.tag == 'channel':
            yield el.get('id'), el
        if el.tag == 'programme':
            yield (el.get('start'), el.get('stop')), el

def combine_element(one, other):
    one_els = elements(one)
    other_els = elements(other)

    merged_els = dict(other_els)
    merged_els.update(one_els)

    result_els = []
    progressend = len(merged_els)
    for i, (_k, el) in enumerate(merged_els.items()):
        printProgressBar(
            i, progressend, prefix='Progress:', suffix='Complete', length=50)
        result_els.append(el)

    return result_els

Upvotes: 1

Related Questions