Engine
Engine

Reputation: 5422

getting the variance using numpy

I want to get the variance of each column in a csv file for that I've wrote the following :

import numpy as np
import csv
import collections
Training        = 'Training.csv'
inputFile       = open(Training,'r',newline='')
cols_values     = collections.defaultdict(list)
numericalValues = []
reader = csv.reader(inputFile)
row = next(reader)

for row in reader:
    for col, value in enumerate(row):
        cols_values[col].append(value)
        numericalValues.append(cols_values[col])

np.var(numericalValues[0], dtype=np.float64)

I get an error in np.var line :

TypeError: cannot perform reduce with flexible type

any idea what I'm missing, the values are definitely digits !

Upvotes: 0

Views: 627

Answers (1)

Micah Smith
Micah Smith

Reputation: 4463

Is there a reason to not use Pandas for this?

import numpy as np
import pandas as pd
Training = 'Training.csv'
df = pd.read_csv(Training)
df.apply(np.var, axis=0)      # can also use `df.var(...)`

You want to make sure that all of your columns have numerical values. You can also use np.nanvar to ignore NaN values if you choose.

Upvotes: 1

Related Questions