Reputation: 105
I have a piece of code that generates a flawed CSV file that I cannot fix. I can however fix this after the fact. Here is the CSV file:
column_a, column_b, column_c, column_d
1, 2_a, 3_b, 4_c, 5_d
1, 2_a, 3_b, 4_c, 5_d
1, 2_a, 3_b, 4_c, 5_d
1, 2_a, 3_b, 4_c, 5_d
I would like to skip the first row and delete all the '1's in the csv file. I have the following code to try and remedy this but I cannot save this file for some reason. I want to edit the file, not create a new one and output results. Here is the code I am toying with:
import csv
file = 'file.csv'
csv_file = open(file)
csv_reader = csv.reader(csv_file)
next(csv_reader) # Skip first row
for row in csv_reader:
del row[0]
csv_reader.close()
It seems so simple yet I cannot manage to save these changes without maybe outputting to a separate file.
Upvotes: 2
Views: 812
Reputation: 123393
Updating the file is going to involve rewriting the whole thing. The code below shows one way of accomplishing this which involves initially writing all the changes into a separate temporary file, and then replacing the original file with it after all the changes have been written to the temporary one.
You can only avoid writing a separate file by reading the entire file into memory, making the changes, and then overwriting the original file with them.
To avoid deleting the column from the header row, it's simply handled separately at the very beginning. The code below illustrates how to do everything:
import csv
import os
from pathlib import Path
from tempfile import NamedTemporaryFile
filepath = Path('file.csv')
with open(filepath, 'r', newline='') as csv_file, \
NamedTemporaryFile('w', newline='', dir=filepath.parent,
delete=False) as tmp_file:
csv_reader = csv.reader(csv_file)
csv_writer = csv.writer(tmp_file)
# First copy the header.
header = next(csv_reader)
csv_writer.writerow(header)
# Copy rows of data leaving out first column.
for row in csv_reader:
csv_writer.writerow(row[1:])
# Replace original file with updated version.
os.replace(tmp_file.name, filepath)
print('fini')
Upvotes: 3
Reputation: 800
This should give you what you need, I think:
import pandas as pd
old_csv = pd.read_csv('path/to/old/csv')
old_csv.to_csv('path/to/new/csv', header=False, index = False)
Old csv:
column_a, column_b, column_c, column_d
1, 2_a, 3_b, 4_c, 5_d
1, 2_a, 3_b, 4_c, 5_d
1, 2_a, 3_b, 4_c, 5_d
1, 2_a, 3_b, 4_c, 5_d
New csv (column of 1s removed, headers skipped):
2_a, 3_b, 4_c, 5_d
2_a, 3_b, 4_c, 5_d
2_a, 3_b, 4_c, 5_d
2_a, 3_b, 4_c, 5_d
Upvotes: 1