Ken
Ken

Reputation: 902

Modifying file contents using regular expressions in Python

I've been trying to remove the numberings from the following lines using a Python script.

jokes.txt:

  1. It’s hard to explain puns to kleptomaniacs because they always take things literally.

  2. I used to think the brain was the most important organ. Then I thought, look what’s telling me that.

When I run this Python script:

import re
with open('jokes.txt', 'r+') as original_file:
    modfile = original_file.read()
    modfile = re.sub("\d+\. ", "", modfile)
    original_file.write(modfile)

The numbers are still there and it gets appended like this:

  1. It’s hard to explain puns to kleptomaniacs because they always take things literally.

  2. I used to think the brain was the most important organ. Then I thought, look what’s telling me that.1. It’s hard to explain puns to kleptomaniacs because they always take things literally.਍ഀ਍ഀ2. I used to think the brain was the most important organ. Then I thought, look what’s telling me that.

I guess the regular expression re.sub("\d+\. ", "", modfile)finds all the digits from 0-9 and replaces it with an empty string.

As a novice, I'm not sure where I messed up. I'd like to know why this happens and how to fix it.

Upvotes: 0

Views: 91

Answers (1)

Rob Watts
Rob Watts

Reputation: 7146

You've opened the file for reading and writing, but after you've read the file in you just start writing without specifying where to write to. That causes it to start writing where you left off reading - at the end of the file.

Other than closing the file and re-opening it just for writing, here's a way to write to the file:

import re
with open('jokes.txt', 'r+') as original_file:
    modfile = original_file.read()
    modfile = re.sub("\d+\. ", "", modfile)
    original_file.seek(0) # Return to start of file
    original_file.truncate() # Clear out the old contents
    original_file.write(modfile)

I don't know why the numbers were still there in the part that you appended, as this worked just fine for me. You might want to add a caret (^) to the start of your regex (resulting in "^\d+\. "). Carets match the start of a line, making it so that if one of your jokes happens to use something like 1. in the joke itself the number at the beginning will be removed but not the number inside the joke.

Upvotes: 5

Related Questions