Reputation: 117
So I have a CSV file with a whole mess of numbers and words and I need to truncate the numbers to three decimal places, but everything I've tried so far doesn't work because of the strings: I get an error message saying 'could not convert strings to floats'.
I used the code suggested here: format to 3 decimal places all decimal number in csv file python
and got that error.
The code looks something like (assume all indents are correct):
with open('metaanalysis_data.csv', 'rb') as f_input,
open('metaanalysis_datas.csv', 'wb') as f_output:
csv_input = csv.reader(f_input, quotechar="'")
csv_output = csv.writer(f_output)
csv_output.writerow(next(csv_input))
for cols in csv_input:
for i in cols:
for i in xrange(1, 10):
#if i != str:
cols[i] = '{:.3f}'.format(float(cols[i]))
csv_output.writerow(cols)
I tried adding:
if i is float:
before 'for i in xrange(1, 10)' and I tried the same thing after and it cleared up my error message but resulted in a file that hadn't been modified at all.
Please help! Thanks!
Upvotes: 0
Views: 3044
Reputation: 59731
The problem is that only some of the values in the CSV are float, but for the rest the conversion will fail. You can try:
with open('metaanalysis_data.csv', 'rb') as f_input,
open('metaanalysis_datas.csv', 'wb') as f_output:
csv_input = csv.reader(f_input, quotechar="'")
csv_output = csv.writer(f_output)
csv_output.writerow(next(csv_input))
for row in csv_input:
for i, elem in enumerate(row):
try:
row[i] = '{:.3f}'.format(float(elem))
except ValueError: pass
csv_output.writerow(row)
That way, when the conversion fails it will raise a ValueError
exception and the element row[i]
will remain what it was.
Upvotes: 1
Reputation: 299
Using pandas in Python, you should be able to get this done like this:
# load pandas package
import pandas as pd
# read in initial file
dataset = pd.read_csv('metaanalysis_data.csv')
# create function that tries to round input to three decimal places,
# returns input if failure occurs (e.g. if not a float)
def try_cutoff(x):
try:
return round(float(x), 3)
except Exception:
return x
# loop over each column and apply try_cutoff to each element in each column
for field in dataset.columns:
dataset[field] = dataset[field].map(try_cutoff)
# write new dataset result to CSV file
dataset.to_csv("some_new_file.csv", index = False)
So, effectively, this creates a function, try_cutoff, that will attempt to round an element to three decimal places. If it hits a string, then it'll just return that string. Then, you run this over each column in your data frame, and write the new data frame out to a file.
Please let me know if that works for you.
Upvotes: 5