Reputation: 117
My program will compare two paragraphs and return similar lines in a list. So split every lines in a list and compared them. Similar lines are append to a list. However, the outcome includes an empty string. Please help me figure out where its coming from.
story1 = '''This is a story.
This has multiple lines.
All lines will be split.
This is the last line.
'''
story2 = '''This is a new story.
This has multiple lines.
All lines will be split.
This is the not last line.
This is a story.
'''
lines1 = story1.split("\n")
lines2 = story2.split("\n")
similarities = []
#print(lines1)
#print(lines2)
for line in lines1:
if line in lines2:
similarities.append(line)
print(similarities)
Upvotes: 0
Views: 36
Reputation: 7206
define your stoy1 and story2 to avoid an empty line
, like:
story1 = '''This is a story.
This has multiple lines.
All lines will be split.
This is the last line.'''
or you can put:
if line in lines2 and line != '':
code:
story1 = '''This is a story.
This has multiple lines.
All lines will be split.
This is the last line.'''
story2 = '''This is a new story.
This has multiple lines.
All lines will be split.
This is the not last line.
This is a story.'''
lines1 = story1.split("\n")
lines2 = story2.split("\n")
similarities = []
for line in lines1:
#if line in lines2 and line != '':
if line in lines2:
similarities.append(line)
print(similarities)
Upvotes: 1
Reputation: 16
Good day to you, Kan.
The reason you find the empty string appended to your similars is that you do actually have an empty line in both your stories.
story1 = '''This is a story.
This has multiple lines.
All lines will be split.
This is the last line.'''
story2 = '''This is a new story.
This has multiple lines.
All lines will be split.
This is the not last line.
This is a story.'''
The above won't append an empty line as the trailing '\n' has been removed.
Upvotes: 0
Reputation: 2218
the output of lines1 and lines2:
In [2]: lines1
Out[2]:
['This is a story.',
'This has multiple lines.',
'All lines will be split.',
'This is the last line.',
'']
In [3]: lines2
Out[3]:
['This is a new story.',
'This has multiple lines.',
'All lines will be split.',
'This is the not last line.',
'This is a story.',
'']
both lists has an empty string which is the result of splitting on "\n" with a multiline block. that's why they both have it as a "similarities"
Upvotes: 0