Ben
Ben

Reputation: 893

String Replacement and Saving to a New File (Python v2.7)

I am trying to replace all lines of a certain format with a blanks in a file i.e. replace a line of number/number/number (like a date) and number:number (like a time) with "". I want to read from the old file and then save the scrubbed version as a new file.

This is the code I have so far (I know it is way off):

old_file = open("old_text.txt", "r")
new_file = open("new_text.txt", "w")

print (old_file.read())

for line in old_file.readlines():
    cleaned_line = line.replace("%/%/%", "")
    cleaned_line = line.replace("%:%", "")
    new_file.write(cleaned_line)

old_file.close
new_file.close

Thank you for your help, Ben

Upvotes: 2

Views: 1168

Answers (2)

abarnert
abarnert

Reputation: 365697

I am trying to replace all lines of a certain format with a blanks in a file i.e. replace a line of number/number/number (like a date) and number:number (like a time) with "".

You can't use str.replace to match a pattern or format, only a literal string.

To match a pattern, you need some kind of parser. For patterns like this, the regular expression engine built into the standard library as re is more than powerful enough… but you will need to learn how to write regular expressions for your patterns. The reference docs and Regular Expression HOWTO are great if you already know the basics; if not, you should search for a tutorial elsewhere.

Anyway, here's how you'd do this (fixing a few other things along the way, most of them explained by Lego Stormtroopr):

import re

with open("old_text.txt") as old_file, open("new_text.txt", "w") as new_file:
    for line in old_file:
        cleaned_line = re.sub(r'\d+/\d+/\d+', '', line)
        cleaned_line = re.sub(r'\d+:\d+', '', cleaned_line)
        new_file.write(cleaned_line)

Also, note that I used cleaned_line in the second sub; just using line again, as in your original code, means we lose the results of the first substitution.

Without knowing the exact definition of your problem, I can't promise that this does exactly what you want. Do you want to blank all lines that contain the pattern number/number/number, blank out all lines that are nothing but that pattern, blank out just that pattern and leave the rest of the line alone? All of those things are doable, and pretty easy, with re, but they're all done a little differently.


If you want to get a little trickier, you can use a single re.sub expression to replace all of the matching lines with blank lines at once, instead of iterating them one at a time. That means a slightly more complicated regexp vs. slightly simpler Python code, and it means probably better performance for mid-sized files but worse performance (and an upper limit) for huge files, and so on. If you can't figure out how to write the appropriate expression yourself, and there's no performance bottleneck to fix, I'd stick with explicit looping.

Upvotes: 2

user764357
user764357

Reputation:

Firstly, there are some indentation issues, where the for loop was indented for no reason. Secondly as soon as you read the file you have seeked to the end, so there are no more lines to read. Lastly, the with command allows you to open a file and declare its variable name, and allow it to close due to error or reading to the end without having to worry about closing it manually.

To perform the actual logic, however, you probably want to use a regular expression. You can use re.search() to find the pattern

  • \d+:\d+ for any number of Digits , a colon and any number of Digits
  • \d+\/\d+\/d+ for three lots of any number of digits, with a literal / between them.

The code you want is closer to this:

import re
with open("old_text.txt", "r") as oldfile, open("new_text.txt", "w") as new_file:
    for line in old_file:
        # This will match if this pattern is anywhere in the line
        if re.search("\d+:\d+", line) is not None:
            line = ""
        # This will match if this pattern is anywhere in the line
        if re.search("\d+\/\d+\/d+", line) is not None:
            line = ""
        new_file.write(line)

If you only want to match at the beginning of the line, re.match() will probably be a better choice.

Here we declare a block with our two files, loop through the old_file, clean each line and write to the new_file. Once the end of the old_file is reached all the files are cleanly closed. If either file is not found, or an error occurs, the with block catches these and releases everything nicely.

Upvotes: 0

Related Questions