Reputation: 511
How do I remove duplicate lines only if they start in a certain way?
Example input:
%start _CreditsInfo
-(half) userCoins {
return 1;
}
%start _CreditsInfo
-(half) userLives {
return 1;
}
Requested output:
%start _CreditsInfo
-(half) userCoins {
return 1;
}
-(half) userLives {
return 1;
}
As you can see, a normal duplicate removal won't work and I don't want to remove other duplicates than those starting with %start
, such as return x;
.
Upvotes: 0
Views: 171
Reputation: 1773
Make each of the line starts (prefix) into a regular expression and keep a set of the ones you've already seen.
import re
class DuplicateFinder(object):
def __init__(self, *prefixes):
self.regexs = [re.compile('^{0}'.format(p)) for p in prefixes]
self.duplicates = set()
def not_duplicate(self, line):
found = reduce(lambda r, p: r or p.search(line), self.regexs, False)
if found:
if found.re.pattern not in self.duplicates:
self.duplicates.add(found.re.pattern)
return True
else:
return False
return True
df = DuplicateFinder('%start', '%other_start')
lines = """%start _CreditsInfo
-(half) userCoins {
return 1;
}
%start _CreditsInfo
-(half) userLives {
return 1;
}""".splitlines()
result = filter(df.not_duplicate, lines)
print '\n'.join(result)
Produces:
%start _CreditsInfo
-(half) userCoins {
return 1;
}
-(half) userLives {
return 1;
}
Upvotes: 1