Luleo_Primoc
Luleo_Primoc

Reputation: 105

How do you delete a column of values in a csv file but not the first item?

I have a piece of code that generates a flawed CSV file that I cannot fix. I can however fix this after the fact. Here is the CSV file:

column_a, column_b, column_c, column_d
1, 2_a, 3_b, 4_c, 5_d
1, 2_a, 3_b, 4_c, 5_d
1, 2_a, 3_b, 4_c, 5_d
1, 2_a, 3_b, 4_c, 5_d

I would like to skip the first row and delete all the '1's in the csv file. I have the following code to try and remedy this but I cannot save this file for some reason. I want to edit the file, not create a new one and output results. Here is the code I am toying with:

import csv

file = 'file.csv'

csv_file = open(file)
csv_reader = csv.reader(csv_file)
next(csv_reader)  # Skip first row

for row in csv_reader:
    del row[0]
    
csv_reader.close()

It seems so simple yet I cannot manage to save these changes without maybe outputting to a separate file.

Upvotes: 2

Views: 812

Answers (2)

martineau
martineau

Reputation: 123393

Updating the file is going to involve rewriting the whole thing. The code below shows one way of accomplishing this which involves initially writing all the changes into a separate temporary file, and then replacing the original file with it after all the changes have been written to the temporary one.

You can only avoid writing a separate file by reading the entire file into memory, making the changes, and then overwriting the original file with them.

To avoid deleting the column from the header row, it's simply handled separately at the very beginning. The code below illustrates how to do everything:

import csv
import os
from pathlib import Path
from tempfile import NamedTemporaryFile


filepath = Path('file.csv')

with open(filepath, 'r', newline='') as csv_file, \
     NamedTemporaryFile('w', newline='', dir=filepath.parent,
                        delete=False) as tmp_file:

    csv_reader = csv.reader(csv_file)
    csv_writer = csv.writer(tmp_file)

    # First copy the header.
    header = next(csv_reader)
    csv_writer.writerow(header)

    # Copy rows of data leaving out first column.
    for row in csv_reader:
        csv_writer.writerow(row[1:])

# Replace original file with updated version.
os.replace(tmp_file.name, filepath)

print('fini')

Upvotes: 3

liorr
liorr

Reputation: 800

This should give you what you need, I think:

import pandas as pd

old_csv = pd.read_csv('path/to/old/csv')

old_csv.to_csv('path/to/new/csv', header=False, index = False)

Old csv:

column_a, column_b, column_c, column_d
1, 2_a, 3_b, 4_c, 5_d
1, 2_a, 3_b, 4_c, 5_d
1, 2_a, 3_b, 4_c, 5_d
1, 2_a, 3_b, 4_c, 5_d

New csv (column of 1s removed, headers skipped):

2_a, 3_b, 4_c, 5_d
2_a, 3_b, 4_c, 5_d
2_a, 3_b, 4_c, 5_d
2_a, 3_b, 4_c, 5_d

Upvotes: 1

Related Questions