Reputation: 13
Currently, I am working on parsing resumes to remove "-" only when it is used at the beginning of each line. I've tried identifying the first character of each string after the text has been split. Below is my code:
for line in text.split('\n'):
if line[0] == "-":
line[0] = line.replace('-', ' ')
line is a string. This is my way of thinking but every time I run this, I get the error IndexError: string index out of range
. I'm unsure of why because since it is a string, the first element should be recognized. Thank you!
Upvotes: 1
Views: 1811
Reputation: 12910
this could be due to empty lines. You could just check the length before taking the index.
new_text = []
text="-testing\nabc\n\n\nxyz"
for line in text.split("\n"):
if line and line[0] == '-':
line = line[1:]
new_text.append(line)
print("\n".join(new_text))
Upvotes: 0
Reputation: 140168
The issue you're getting is because some lines are empty.
Then your replacement is wrong:
line
is lost at the next iteration. The original list of lines too, by the way.If you want to remove the first character of a string, no need for replace
, just slice the string (and don't risk to remove other similar characters).
A working solution would be to test with startswith
and rebuild a new list of strings. Then join back
text = """hello
-yes--
who are you"""
new_text = []
for line in text.splitlines():
if line.startswith("-"):
line = line[1:]
new_text.append(line)
print("\n".join(new_text))
result:
hello
yes--
who are you
with more experience, you can pack this code into a list comprehension:
new_text = "\n".join([line[1:] if line.startswith("-") else line for line in text.splitlines()])
finally, regular expression module is also a nice alternative:
import re
print(re.sub("^-","",text,flags=re.MULTILINE))
this removes the dash on all lines starting with dash. Multiline flag tells regex engine to consider ^
as the start of the line, not the start of the buffer.
Upvotes: 4