getting the variance using numpy

Question

I want to get the variance of each column in a csv file for that I've wrote the following :

import numpy as np
import csv
import collections
Training        = 'Training.csv'
inputFile       = open(Training,'r',newline='')
cols_values     = collections.defaultdict(list)
numericalValues = []
reader = csv.reader(inputFile)
row = next(reader)

for row in reader:
    for col, value in enumerate(row):
        cols_values[col].append(value)
        numericalValues.append(cols_values[col])

np.var(numericalValues[0], dtype=np.float64)

I get an error in np.var line :

TypeError: cannot perform reduce with flexible type

any idea what I'm missing, the values are definitely digits !

Micah Smith · Accepted Answer

Is there a reason to not use Pandas for this?

import numpy as np
import pandas as pd
Training = 'Training.csv'
df = pd.read_csv(Training)
df.apply(np.var, axis=0)      # can also use `df.var(...)`

You want to make sure that all of your columns have numerical values. You can also use np.nanvar to ignore NaN values if you choose.

getting the variance using numpy

Answers (1)

Related Questions