Mehrdad Fonooni
Mehrdad Fonooni

Reputation: 49

Regular expression search and substitute values in CSV file

I want to find and replace all of the Managerial positions in a CSV file with number 3. The list contains different positions from simple ",Manager," to ",Construction Project Manager and Project Superintendent," but all of them are placed between two commas. I wrote this to find them all:

[,\s]?([A-Za-z. '\s/()\"]+)?(Manager|manager)([A-Za-z. '\s/()]+)?,

The Problem is that sometimes a comma is common between two adjacent Managrial position. So I need to include comma when I want to find the positions but I need to exclude it when I want to replace the position with 3! How Can I do that with a regular expression in Python?

Here is the CSV file.

Upvotes: 1

Views: 1379

Answers (1)

wp78de
wp78de

Reputation: 18950

I suggest using Python's built-in CSV module instead. Let's not reinvent the wheel here and consider handling CSV as a solved problem.

Here is some sample code that demonstrates how it can be done: The csv module is responsible for reading and writing the file with the correct delimiter and quotation char. re.search is used to search individual cells/columns for your keyword. If manager is found, put a 3, otherwise, put the original content and write the row back when done.

import csv, sys, re

infile= r'in.csv'
outfile= r'out.csv'
o = open(outfile, 'w', newline='')
csvwri = csv.writer(o, delimiter=',', quotechar='\"', quoting=csv.QUOTE_MINIMAL)

with open(infile, newline='') as f:
    reader = csv.reader(f, delimiter=',', quotechar='\"', quoting=csv.QUOTE_MINIMAL)
    try:
        for row in reader:
            newrow = []
            for col in row:
                if re.search("manager", col, re.I):
                    newrow.append("3")
                else:
                    newrow.append(col)
            csvwri.writerow(newrow)
    except csv.Error as e:
        sys.exit('file {}, line {}: {}'.format(infile, reader.line_num, e))

o.flush()
o.close()

Straightforward and clean, I would say.

If you insist on using a regex, here's an improved pattern:

[,\s]?([A-Za-z. '\s/()\"]+)?(Manager|manager)([A-Za-z. '\s/()]+)?(?=,)

Replace with 3, as shown in the demo.

However, I believe you are still better off with the csv lib approach.

Upvotes: 2

Related Questions