fsperrle
fsperrle

Reputation: 1302

Replace value from CSV with Python

I have to replace values in a large CSV file and decided for Python as programming language I want to use.

The value I need to change is the first on each line in my comma separated CSV:

ToReplace, a1, a2, ..., aN
1, ab, cd, ..., xy
80, ka, kl, ..., df

It's always a number, the amount if digits isn't fixed, though.

I've got two ideas at the moment: Process the data line by line and ...

  1. Use a regular expression to match the number
  2. Use the CSV component to parse the line

As I'm very new to Python there are some questions that came to mind:

Upvotes: 0

Views: 817

Answers (2)

logc
logc

Reputation: 3923

You can pass a second argument to Python's split method in order to get just the first match, replace that with whatever you want, then join back into a single string, like this:

import logging

with open('example.csv', 'rb') as infile, \
        open('result.csv', 'wb') as outfile:
    for line in in file:
        try:
            number, rest = line.split(',', 1)
            number = 'blob'
            outfile.write(','.join([number, rest]))
        except ValueError:
            logging.error('The following line had no separator: %s', line)

For 10 million rows, on 2 cores at 2.4 GHz and 8 Gb RAM, I get the following times:

$ time python example.py

real    0m20.771s
user    0m20.336s
sys 0m0.369s

Upvotes: 0

jfs
jfs

Reputation: 414149

If you want to replace the first column that always contains a number then you could use a string method instead of a more general csv module, to avoid parsing the whole line:

#!/usr/bin/env python

def main():
    with open('50gb_file', 'rb') as file, open('output', 'wb') as output_file:
        for line in file:
            number, sep, rest = line.partition(b',')
            try:
                number = int(number)*2 #XXX replace number here
            except ValueError:
                pass # don't replace the number
            else:
                line = bytes(number) + sep + rest
            output_file.write(line)

main()

Upvotes: 2

Related Questions