Alice
Alice

Reputation: 327

How to read numbers from csv file to calculate the mean of a column and skip empty strings?

I have a csv file and I have to compute the mean for some of the columns. That's how I did:

file = csv.reader(open('tab.csv','r'))
n = []
for row in file:
    n.append(row[8])

So I have a list of string: n = ['', '', '1.58', ...]. How can I convert these to float? I tried with:

n_values = np.array(n)
n_values[n == ''] = '0'
values = n_values.astype(np.float)
np.mean(values)

But the mean is not correct because I should skip the empty strings not counting.

Upvotes: 18

Views: 51006

Answers (2)

syviad
syviad

Reputation: 400

Just add quoting=csv.QUOTE_NONNUMERIC:

Instructs reader objects to convert all non-quoted fields to type float.

with open('tab.csv', newline='') as file:
    reader = csv.reader(file, quoting=csv.QUOTE_NONNUMERIC)
    n = []
    for row in reader:
        n.append(row[8])

Upvotes: 20

Padraic Cunningham
Padraic Cunningham

Reputation: 180411

Just cast as you append:

 n.append(float(row[8]))

If there are empty strings catch those before appending.

try:
    n.append(float(row[8]))
except ValueError:
   continue

Or you might want to try pandas, in particular pandas.read_csv:

import pandas as pd

df = pd.read_csv("in.csv")
print(df["col_name"].mean())

Upvotes: 17

Related Questions