Robert A.  Fettikowski
Robert A. Fettikowski

Reputation: 185

How to find the average of multiple columns in a file using python

Hi I have a file that consists of too many columns to open in excel. Each column has 10 rows of numerical values 0-2 and has a row saying the title of the column. I would like the output to be the name of the column and the average value of the 10 rows. The file is too large to open in excel 2000 so I have to try using python. Any tips on an easy way to do this.

Here is a sample of the first 3 columns:

Trial1 Trial2 Trial3

1 0 1

0 0 0

0 2 0

2 2 2

1 1 1

1 0 1

0 0 0

0 2 0

2 2 2

1 1 1

I want python to output as a test file

Trial 1 Trial 2 Trial 3 1 2 1 (whatever the averages are)

Upvotes: 2

Views: 5667

Answers (4)

ghollisjr
ghollisjr

Reputation: 153

Less of an answer than it is an alternative understanding of the problem:

You could think of each line being a vector. In this way, the average done column-by-column is just the average of each of these vectors. All you need in order to do this is

  1. A way to read a line into a vector object,

  2. A vector addition operation,

  3. Scalar multiplication (or division) of vectors.

Python comes (I think) with most of this already installed, but this should lead to some easily readable code.

Upvotes: 0

AndiDog
AndiDog

Reputation: 70158

A memory-friendly solution without using any modules:

with open("filename", "rtU") as f:
    columns = f.readline().strip().split(" ")
    numRows = 0
    sums = [0] * len(columns)

    for line in f:
        # Skip empty lines
        if not line.strip():
            continue

        values = line.split(" ")
        for i in xrange(len(values)):
            sums[i] += int(values[i])
        numRows += 1

    for index, summedRowValue in enumerate(sums):
        print columns[index], 1.0 * summedRowValue / numRows

Upvotes: 2

Mark Byers
Mark Byers

Reputation: 838226

You can use the builtin csv module:

import csv
csvReader = csv.reader(open('input.txt'), delimiter=' ')
headers = csvReader.next()
values = [map(int, row) for row in csvReader]

def average(l):
    return float(sum(l)) / len(l)

averages = [int(round(average(trial))) for trial in zip(*values)]

print ' '.join(headers)
print ' '.join(str(x) for x in averages)

Result:

Trial1 Trial2 Trial3
1 1 1

Upvotes: 1

ars
ars

Reputation: 123508

You can use Numpy:

import numpy as np
from StringIO import StringIO

s = StringIO('''\
Trial1 Trial2 Trial3
1 0 1
0 0 0
0 2 0
2 2 2
1 1 1
1 0 1
0 0 0
0 2 0
2 2 2
1 1 1
''')

data = np.loadtxt(s, skiprows=1)  # skip header row
print data.mean(axis=0)  # column means

# OUTPUT: array([ 0.8,  1. ,  0.8])

Note that the first argument to loadtxt could be the name of your file instead of a file like object.

Upvotes: 2

Related Questions