If something exists in both sides, compare the results

Question

Having two strings:

machine1 665600MB 512512MB 19%                    
machine2 53248MB  41000MB  20%  
machine3 625600MB 522512MB 22%

and:

machine1 665600MB 512512MB 21%                    
machine2 53248MB  41000MB  22%  
machine3 625600MB 522512MB 21%
machine5 53248MB  41000MB  23%

I would like to compare the differences of both, but only for those machines that are the same in both sides (machine1, 2 and 3), avoiding machine5 (that must be for both sides, if something exists in one, but not in the other, it must be ignored).

To compare both strings I use this:

avoid = {x.rstrip() for x in string2.splitlines()}
result = str("
".join(x for x in string1.splitlines() if x.rstrip() not in avoid))

But it compares all the differences only in one side...

Tom · Accepted Answer

My thought is to use regex to identify the machines in each string and their intersection:

import re

string1 = '''machine1 665600MB 512512MB 19%
machine2 53248MB  41000MB  20%
machine3 625600MB 522512MB 22%'''

string2 = '''machine1 665600MB 512512MB 21%
machine2 53248MB  41000MB  22%
machine3 625600MB 522512MB 21%
machine5 53248MB  41000MB  23%'''

pat = 'machine\d+'

machines1 = re.findall(pat, string1)
machines2 = re.findall(pat, string2)
intersect = set(machines1) & set(machines2)
# {'machine1', 'machine2', 'machine3'}

Then subset based on that intersection, using the same split-and-join that you did above:

newstring1 = '
'.join(line for line in string1.splitlines() if
                       re.search(pat, line).group() in intersect)
newstring2 = '
'.join(line for line in string2.splitlines() if
                       re.search(pat, line).group() in intersect)

The result is these two new strings:

>>> print(newstring1)
machine1 665600MB 512512MB 19%
machine2 53248MB  41000MB  20%
machine3 625600MB 522512MB 22%

>>> print(newstring2)
machine1 665600MB 512512MB 21%
machine2 53248MB  41000MB  22%
machine3 625600MB 522512MB 21%

How you want to "compare" them is a little vague, but the two new strings should only contain records for the same machines.

If something exists in both sides, compare the results

Answers (1)

Related Questions