Reputation: 847
Now I know it's usually not feasible to modify a csv file as you are reading from it so you need to create a new csv file and write to it. The problem I'm having is preserving the original order of the data.
The input csv file looks like follows:
C1 C2 C3
apple BANANA Mango
pear PineApple StRaWbeRRy
I want to turn all the data into lower case and output a new csv file that looks like:
C1 C2 C3
apple banana mango
pear pineapple strawberry
So far I can iterate through the input csv file and turn all the values into lower case but I don't know how to rewrite it back into a csv file in that format. The code I have is:
def clean (input)
aList = []
file = open(input, "r")
reader = csv.reader(file, delimiter = ',')
next(reader, None) # Skip the header but I want to preserve it in the output csv file
for row in reader:
for col in row:
aList.append(col.lower())
So now I have a list with all the lowercase data, how do I rewrite it back into a csv file of the same format (same number of rows and columns) as the input including the header row that I skipped in the code.
Upvotes: 4
Views: 75085
Reputation: 11776
Pandas way:
Read the file using pandas and get the dataframe. Then you can simply use lower()
import pandas as pd
def conversion(text):
return text.lower()
df = pd.read_csv(file_path)
df[column_name] = df[column_name].map(conversion)
Or even a single liner:
df[column_name] = df[column_name].apply(lambda x: x.lower()) # If you have nan or other non-string values, you may need to convert x to string first like str(x).lower()
Then you can save it using to_csv function
Upvotes: 11
Reputation: 4992
the easiest way that i found is as follows
let the initial CSV
file name be test.csv
with open('test.csv','r') as f:
with open('cleaned.csv','w') as ff:
ff.write(f.readline())
ff.write(f.read().lower())
the above code will create a new csv with all lower case
Upvotes: 0
Reputation: 5388
If you want to use csv module for all then use following code snippet.
import os
import csv
def clean(input):
tmpFile = "tmp.csv"
with open(input, "r") as file, open(tmpFile, "w") as outFile:
reader = csv.reader(file, delimiter=',')
writer = csv.writer(outFile, delimiter=',')
header = next(reader)
writer.writerow(header)
for row in reader:
colValues = []
for col in row:
colValues.append(col.lower())
writer.writerow(colValues)
os.rename(tmpFile, input)
Upvotes: 3
Reputation: 2637
If all you want to do is change the case of the data and preserve everything else you might be best to skip the csv module and just use a straight file eg:
# Open both files
with open("infile.csv") as f_in, open("outfile.csv", 'w') as f_out:
# Write header unchanged
header = f_in.readline()
f_out.write(header)
# Transform the rest of the lines
for line in f_in:
f_out.write(line.lower())
Upvotes: 5