Reputation: 85
I have a multilined string that has some repeated lines. I want to remove not just the repeated line, but also the "original" that is repeated.
I found some answers about removing just the repeated line, leaving the original but I didn't know how to adapt it, and when I tried I failed.
text = """<br/>
Somewhere in China there is a copy of this vid.<br/>
2 years ago<br/>
Not sure really<br/>
Aiur Productions<br/>
Aiur Productions<br/>
2 years ago<br/>
"""<br/>
lines_seen = set() # holds lines already seen<br/>
for line in text:
if line not in lines_seen: # not a duplicate
print(lines_seen.add(line))
I got several rows of "none". As mentioned, the code above comes from a different question, where the asker wanted to remove repeated lines but leave the non-repeated ones and one version of the repeated ones. What I want is output like this:
Somewhere in China there is a copy of this vid.
Not sure really
with all duplicated lines (e.g "two years ago") removed so that only lines that were not repeated in the original are left.
Upvotes: 0
Views: 145
Reputation: 24691
set.add()
doesn't return anything. When you try to print its return value, you thus get None
. If you want to both print the line and put it into the set, you need to use two separate statements:
for line in text:
if line not in lines_seen: # not a duplicate
print(line)
lines_seen.add(line)
This will print every line once, in its first appearance. If you want to print only the lines that are never duplicated, then I would recommend keeping a parallel list of lines that were never repeated:
lines_seen = set()
unique_lines = list()
for line in text:
if line not in lines_seen:
lines_seen.add(line)
unique_lines.append(line)
elif line in unique_lines:
unique_lines.remove(line)
# and then print all the lines that were not removed from unique_lines on their second appearance
# in the order that they first appeared
for line in unique_lines:
print(line)
Upvotes: 1
Reputation: 3308
You can solve your problem using this approach:
from collections import Counter
text = """<br/>
Somewhere in China there is a copy of this vid.<br/>
2 years ago<br/>
Not sure really<br/>
Aiur Productions<br/>
Aiur Productions<br/>
2 years ago<br/>
"""
str_counts = Counter(text.replace('<br/>', '').split('\n'))
result = '\n'.join([elem for elem in str_counts if str_counts[elem] == 1])
print(result)
# Somewhere in China there is a copy of this vid.
# Not sure really
Upvotes: 1
Reputation: 21
from collections import Counter, OrderedDict
class OrderedCounter(Counter, OrderedDict):
'Counter that remembers the order elements are first encountered'
def __repr__(self):
return '%s(%r)' % (self.__class__.__name__, OrderedDict(self))
def __reduce__(self):
return self.__class__, (OrderedDict(self),)
updated = []
for k,v in OrderedCounter(text.split('<br/>')).items():
if v == 1:
updated.append(k)
print('<br/>'.join(updated))
Upvotes: 1
Reputation: 813
I am not 100% sure what you are asking but I think that you want to print out all the lines but not the ones that are repeated more than once.
lines = []
delete = []
for line in text.split("\n"):
if line in lines:
if lines.index(line) not in delete:
delete.append(line)
else:
lines.append(line)
[lines.pop(x) for x in delete]
This code isn't perfect but should convey the idea
Upvotes: 0