Reputation: 97

Selecting and operating on columns in a .csv

I have a csv with 38 columns and 1500+ rows which contains floats and strings. I want 3 columns (x,y,z) of float data from this set to find the average of f=(x+y)/z. After research I successfully isolated these columns as numpy arrays and performed f=(x+y)/z. Now when I try to sum f the array isn't added up. I print f And I see 1500 items of correct values but not the sum of these.

  reader=csv.reader(open('myfile.csv' ,"rb"),delimiter=',')
  reader.next()
  reader.next()
  x=list(reader)
  data=numpy.array(x)
  rows=data.shape[0]
  for i in range (0,rows):
      x=numpy.array(data[i,18]).astype('float')
      y=numpy.array(data[i,19]).astype('float')
      z=numpy.array(data[i,6]).astype('float')
      f=numpy.array((x+y)/z)
      average=numpy.sum(f)/rows
      print average

Upvotes: 3

Answers (3)

DrRobotNinja

Reputation: 1421

Numpy allows you to operate on the arrays as a whole, you don't need to iterate through them.

reader=csv.reader(open('myfile.csv' ,"rb"),delimiter=',')
reader.next()
reader.next()
x=list(reader)
data=numpy.array(x)
rows=data.shape[0]

x=data[:,18].astype('float')
y=data[:,19].astype('float')
z=data[:,6].astype('float')

f = (x + y) / z
average = f.mean()

print average

Upvotes: 2

Jaime

Reputation: 67427

If data is already an array, you don't need the for loop:

x = data[:, 18].astype(float)
y = data[:, 19].astype(float)
z = data[:, 6].astype(float)
f = (x+y) / z
average = np.average(f)

You would probably be better off by reading your file with np.loadtxt:

data = np.loadtxt('myfile.csv', dtype=float, delimiter=',' skiprows=2,
                  usecols=(6, 18, 19))

or to get x, y and z directly:

x, y, z = np.loadtxt('myfile.csv', dtype=float, delimiter=',' skiprows=2,
                     usecols=(6, 18, 19), unpack=True)

Upvotes: 5

John

Reputation: 13699

If you're not locked into numpy here is a pure python solution,

import csv

def f(x, y, z):
    x = float(x)
    y = float(y)
    z = float(z)
    return (x+y)/z

reader = csv.reader(open("derp.csv", 'r'))
rows = list(reader)
len_of_rows = len(rows)

f_values = []

for row in rows:
    x = row[0]
    y = row[1]
    z = row[2]
    f_values.append(f(x, y, z))

average = sum(f_values)/len_of_rows
print average

Here is what my derp.csv looks like

1,2,3
4,5,6
7,8,9

Upvotes: 0

Selecting and operating on columns in a .csv

Answers (3)

Related Questions