Reputation: 49
I want to find and replace all of the Managerial positions in a CSV file with number 3. The list contains different positions from simple ",Manager," to ",Construction Project Manager and Project Superintendent," but all of them are placed between two commas. I wrote this to find them all:
[,\s]?([A-Za-z. '\s/()\"]+)?(Manager|manager)([A-Za-z. '\s/()]+)?,
The Problem is that sometimes a comma is common between two adjacent Managrial position. So I need to include comma when I want to find the positions but I need to exclude it when I want to replace the position with 3! How Can I do that with a regular expression in Python?
Here is the CSV file.
Upvotes: 1
Views: 1379
Reputation: 18950
I suggest using Python's built-in CSV module instead. Let's not reinvent the wheel here and consider handling CSV as a solved problem.
Here is some sample code that demonstrates how it can be done: The csv module is responsible for reading and writing the file with the correct delimiter and quotation char.
re.search is used to search individual cells/columns for your keyword. If manager
is found, put a 3, otherwise, put the original content and write the row back when done.
import csv, sys, re
infile= r'in.csv'
outfile= r'out.csv'
o = open(outfile, 'w', newline='')
csvwri = csv.writer(o, delimiter=',', quotechar='\"', quoting=csv.QUOTE_MINIMAL)
with open(infile, newline='') as f:
reader = csv.reader(f, delimiter=',', quotechar='\"', quoting=csv.QUOTE_MINIMAL)
try:
for row in reader:
newrow = []
for col in row:
if re.search("manager", col, re.I):
newrow.append("3")
else:
newrow.append(col)
csvwri.writerow(newrow)
except csv.Error as e:
sys.exit('file {}, line {}: {}'.format(infile, reader.line_num, e))
o.flush()
o.close()
Straightforward and clean, I would say.
If you insist on using a regex, here's an improved pattern:
[,\s]?([A-Za-z. '\s/()\"]+)?(Manager|manager)([A-Za-z. '\s/()]+)?(?=,)
Replace with 3, as shown in the demo.
However, I believe you are still better off with the csv lib approach.
Upvotes: 2