Josh
Josh

Reputation: 681

Using multiple re.sub() calls in one file with Python

I have a file with a large amount of random strings contained with in it. There are certain patterns that I wan't to remove, so I decided to use RegEX to check for them. So far this code, does exactly what I want it to:

#!/usr/bin/python

import csv
import re
import sys
import pdb


f=open('output.csv', 'w')

with open('retweet.csv', 'rb') as inputfile:
    read=csv.reader(inputfile, delimiter=',')
    for row in read:
        f.write(re.sub(r'@\s\w+', ' ', row[0]))
        f.write("\n")
f.close()

f=open('output2.csv', 'w')

with open('output.csv', 'rb') as inputfile2:
    read2=csv.reader(inputfile2, delimiter='\n')
    for row in read2:
        a= re.sub('[^a-zA-Z0-9]', ' ', row[0])
        b= str.split(a)
        c= "+".join(b)
        f.write("http://www.google.com/webhp#q="+c+"&btnI\n")
f.close()

The problem is, I would like to avoid having to open and close a file as this can get messy if I need to check for more patterns. How can I perform multiple re.sub() calls on the same file and write it out to a new file with all substitutions?

Thanks for any help!

Upvotes: 0

Views: 1731

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1123620

Apply all your substitutions in one go on the current line:

with open('retweet.csv', 'rb') as inputfile:
    read=csv.reader(inputfile, delimiter=',')
    for row in read:
        text = row[0]
        text = re.sub(r'@\s\w+', ' ', text)
        text = re.sub(another_expression, another_replacement, text)
        # etc.
        f.write(text + '\n')

Note that opening a file with csv.reader(..., delimiter='\n') sounds awfully much as if you are treating that file as a sequence of lines; you could just loop over the file:

with open('output.csv', 'rb') as inputfile2:
    for line in inputfile2:

Upvotes: 3

Related Questions