Reputation: 33
txt file with 40000 lines. In each lines are comma seperated numbers. I want to remove a specific number in the lines 36000 to 39000. For example number 233. But i dont want to remove the string from number 23341.
Here is my code so far:
with open("example.txt","r") as file:
newline =[]
i = 0
for l in file.readlines():
if i>=36000 and i<=39000:
newline.append(word.replace("233",""))
else:
newline.append(word.replace("233","233"))
i = i + 1
with open("example.txt","w") as file:
for line in newline:
f.writelines(line)
Is there a more elegant way to solve this problem?
Upvotes: 1
Views: 45
Reputation: 92854
Iterating over a large text file with appending each line into a list to further overwrite the whole file - is definitely inefficient approach, use fileinput
module and precompiled (with re.compile
) regex pattern instead:
import fileinput, re
with fileinput.input('example.txt', inplace=True, encoding='utf-8') as f:
pat = re.compile(r'\b233\b')
for i, line in enumerate(f):
if i >= 36000 and i <= 39000:
line = pat.sub('', line)
print(line, end='')
Upvotes: 1
Reputation: 521427
You may use a regex replacement here:
for line in file.readlines():
if i >= 36000 and i <= 39000:
line = re.sub(r',?\b233\b,?', ',', line).strip(',')
newline.append(line)
i = i + 1
The above regex logic targets specifically the value 233
as a CSV value. The pattern and replacement ensure that the resulting CSV has no empty values or dangling commas.
Upvotes: 1