birdoptera
birdoptera

Reputation: 117

Truncate numbers to 3 decimal places in CSV file while ignoring strings

So I have a CSV file with a whole mess of numbers and words and I need to truncate the numbers to three decimal places, but everything I've tried so far doesn't work because of the strings: I get an error message saying 'could not convert strings to floats'.

I used the code suggested here: format to 3 decimal places all decimal number in csv file python

and got that error.

The code looks something like (assume all indents are correct):

with open('metaanalysis_data.csv', 'rb') as f_input, 
open('metaanalysis_datas.csv', 'wb') as f_output:
    csv_input = csv.reader(f_input, quotechar="'")
    csv_output = csv.writer(f_output)
    csv_output.writerow(next(csv_input))

    for cols in csv_input:
        for i in cols:
            for i in xrange(1, 10):
            #if i != str:
                cols[i] = '{:.3f}'.format(float(cols[i]))
        csv_output.writerow(cols)

I tried adding:

if i is float:

before 'for i in xrange(1, 10)' and I tried the same thing after and it cleared up my error message but resulted in a file that hadn't been modified at all.

Please help! Thanks!

Upvotes: 0

Views: 3044

Answers (2)

javidcf
javidcf

Reputation: 59731

The problem is that only some of the values in the CSV are float, but for the rest the conversion will fail. You can try:

with open('metaanalysis_data.csv', 'rb') as f_input, 
open('metaanalysis_datas.csv', 'wb') as f_output:
    csv_input = csv.reader(f_input, quotechar="'")
    csv_output = csv.writer(f_output)
    csv_output.writerow(next(csv_input))

    for row in csv_input:
        for i, elem in enumerate(row):
            try:
                row[i] = '{:.3f}'.format(float(elem))
            except ValueError: pass
        csv_output.writerow(row)

That way, when the conversion fails it will raise a ValueError exception and the element row[i] will remain what it was.

Upvotes: 1

atreadw
atreadw

Reputation: 299

Using pandas in Python, you should be able to get this done like this:

# load pandas package
import pandas as pd

# read in initial file
dataset = pd.read_csv('metaanalysis_data.csv')

# create function that tries to round input to three decimal places,
# returns input if failure occurs (e.g. if not a float)
def try_cutoff(x):

    try:
        return round(float(x), 3)
    except Exception:
        return x

# loop over each column and apply try_cutoff to each element in each column
for field in dataset.columns:


    dataset[field] = dataset[field].map(try_cutoff)

# write new dataset result to CSV file
dataset.to_csv("some_new_file.csv", index = False)

So, effectively, this creates a function, try_cutoff, that will attempt to round an element to three decimal places. If it hits a string, then it'll just return that string. Then, you run this over each column in your data frame, and write the new data frame out to a file.

Please let me know if that works for you.

Upvotes: 5

Related Questions