Reputation: 185
Hi I have a file that consists of too many columns to open in excel. Each column has 10 rows of numerical values 0-2 and has a row saying the title of the column. I would like the output to be the name of the column and the average value of the 10 rows. The file is too large to open in excel 2000 so I have to try using python. Any tips on an easy way to do this.
Here is a sample of the first 3 columns:
Trial1 Trial2 Trial3
1 0 1
0 0 0
0 2 0
2 2 2
1 1 1
1 0 1
0 0 0
0 2 0
2 2 2
1 1 1
I want python to output as a test file
Trial 1 Trial 2 Trial 3 1 2 1 (whatever the averages are)
Upvotes: 2
Views: 5667
Reputation: 153
Less of an answer than it is an alternative understanding of the problem:
You could think of each line being a vector. In this way, the average done column-by-column is just the average of each of these vectors. All you need in order to do this is
A way to read a line into a vector object,
A vector addition operation,
Scalar multiplication (or division) of vectors.
Python comes (I think) with most of this already installed, but this should lead to some easily readable code.
Upvotes: 0
Reputation: 70158
A memory-friendly solution without using any modules:
with open("filename", "rtU") as f:
columns = f.readline().strip().split(" ")
numRows = 0
sums = [0] * len(columns)
for line in f:
# Skip empty lines
if not line.strip():
continue
values = line.split(" ")
for i in xrange(len(values)):
sums[i] += int(values[i])
numRows += 1
for index, summedRowValue in enumerate(sums):
print columns[index], 1.0 * summedRowValue / numRows
Upvotes: 2
Reputation: 838226
You can use the builtin csv
module:
import csv
csvReader = csv.reader(open('input.txt'), delimiter=' ')
headers = csvReader.next()
values = [map(int, row) for row in csvReader]
def average(l):
return float(sum(l)) / len(l)
averages = [int(round(average(trial))) for trial in zip(*values)]
print ' '.join(headers)
print ' '.join(str(x) for x in averages)
Result:
Trial1 Trial2 Trial3 1 1 1
Upvotes: 1
Reputation: 123508
You can use Numpy:
import numpy as np
from StringIO import StringIO
s = StringIO('''\
Trial1 Trial2 Trial3
1 0 1
0 0 0
0 2 0
2 2 2
1 1 1
1 0 1
0 0 0
0 2 0
2 2 2
1 1 1
''')
data = np.loadtxt(s, skiprows=1) # skip header row
print data.mean(axis=0) # column means
# OUTPUT: array([ 0.8, 1. , 0.8])
Note that the first argument to loadtxt
could be the name of your file instead of a file like object.
Upvotes: 2