Reputation: 5422
I want to get the variance of each column in a csv file for that I've wrote the following :
import numpy as np
import csv
import collections
Training = 'Training.csv'
inputFile = open(Training,'r',newline='')
cols_values = collections.defaultdict(list)
numericalValues = []
reader = csv.reader(inputFile)
row = next(reader)
for row in reader:
for col, value in enumerate(row):
cols_values[col].append(value)
numericalValues.append(cols_values[col])
np.var(numericalValues[0], dtype=np.float64)
I get an error in np.var
line :
TypeError: cannot perform reduce with flexible type
any idea what I'm missing, the values are definitely digits !
Upvotes: 0
Views: 627
Reputation: 4463
Is there a reason to not use Pandas for this?
import numpy as np
import pandas as pd
Training = 'Training.csv'
df = pd.read_csv(Training)
df.apply(np.var, axis=0) # can also use `df.var(...)`
You want to make sure that all of your columns have numerical values. You can also use np.nanvar
to ignore NaN
values if you choose.
Upvotes: 1