Remove Duplicate Lines Starting With Certain String

Question

How do I remove duplicate lines only if they start in a certain way?

Example input:

%start _CreditsInfo
-(half) userCoins {
return 1;
}
%start _CreditsInfo
-(half) userLives {
return 1;
}

Requested output:

%start _CreditsInfo
-(half) userCoins {
return 1;
}
-(half) userLives {
return 1;
}

As you can see, a normal duplicate removal won't work and I don't want to remove other duplicates than those starting with %start, such as return x;.

Mike Robins · Accepted Answer

Make each of the line starts (prefix) into a regular expression and keep a set of the ones you've already seen.

import re

class DuplicateFinder(object):

    def __init__(self, *prefixes):
        self.regexs = [re.compile('^{0}'.format(p)) for p in prefixes]
        self.duplicates = set()

    def not_duplicate(self, line):
        found = reduce(lambda r, p: r or p.search(line), self.regexs, False)
        if found:
            if found.re.pattern not in self.duplicates:
                self.duplicates.add(found.re.pattern)
                return True
            else:
                return False
        return True

df = DuplicateFinder('%start', '%other_start')


lines = """%start _CreditsInfo
-(half) userCoins {
return 1;
}
%start _CreditsInfo
-(half) userLives {
return 1;
}""".splitlines()

result = filter(df.not_duplicate, lines)

print '
'.join(result)

Produces:

%start _CreditsInfo
-(half) userCoins {
return 1;
}
-(half) userLives {
return 1;
}

Remove Duplicate Lines Starting With Certain String

Answers (1)

Related Questions