Reputation: 13

Calculating an average for every X number of lines

I am trying to take data from a text file and calculate an average for every 600 lines of that file. I'm loading the text from the file, putting it into a numpy array and enumerating it. I can get the average for the first 600 lines but I'm not sure how to write a loop so that python calculates an average for every 600 lines and then puts this into a new text file. Here is my code so far:

import numpy as np

#loads file and places it in array
data = np.loadtxt('244UTZ10htz.txt', delimiter = '\t', skiprows = 2)
shape = np.shape(data)

#creates array for u wind values
for i,d in enumerate(data):
    data[i] = (d[3])
    if i == 600:
        minavg = np.mean(data[i == 600])

#finds total u mean for day
ubar = np.mean(data)

Upvotes: 2

Answers (4)

mdadm

Reputation: 1363

Based on what I understand from your question, it sounds like you have some file that you want to take the mean of every line up to the 600th one, and repeat that multiple times till there is no more data. So at line 600 you average lines 0 - 600, at line 1200 you average lines 600 to 1200.

Modulus division would be one approach to taking the average when you hit every 600th line, without having to use a separate variable to keep count how many lines you've looped through. Additionally, I used Numpy Array Slicing to create a view of the original data, containing only the 4th column out of the data set.

This example should do what you want, but it is entirely untested... I'm also not terribly familiar with numpy, so there are some better ways do this as mentioned in the other answers:

import numpy as np

#loads file and places it in array
data = np.loadtxt('244UTZ10htz.txt', delimiter = '\t', skiprows = 2)
shape = np.shape(data)
data_you_want = data[:,3]
daily_averages = list()


#creates array for u wind values
for i,d in enumerate(data_you_want):
    if (i % 600) == 0:
        avg_for_day = np.mean(data_you_want[i - 600:i])
        daily_averages.append(avg_for_day)

You can either modify the example above to write the mean out to a new file, instead of appending to a list as I have done, or just write the daily_averages list out to whatever file you want.

As a bonus, here is a Python solution using only the CSV library. It hasn't been tested much, but theoretically should work and might be fairly easy to understand for someone new to Python.

import csv 

data = list()
daily_average = list()
num_lines = 600

with open('testme.csv', 'r') as csvfile:
    reader = csv.reader(csvfile, delimiter="\t")

    for i,row in enumerate(reader):
        if (i % num_lines) == 0 and i != 0:
            average = sum(data[i - num_lines:i]) / num_lines
            daily_average.append(average)

        data.append(int(row[3]))

Hope this helps!

Upvotes: 4

M4rtini

Reputation: 13549

Something like this works. Maybe not that readable. But should be fairly fast.

n = int(data.shape[0]/600)
interestingData = data[:,3]
daily_averages =  np.mean(interestingData[:600*n].reshape(-1, 600), axis=1)

Upvotes: 0

TooTone

Reputation: 8146

The following program uses array slicing to get the column, and then a list comprehension indexing into the column to get the means. It might be simpler to use a for loop for the latter.

Slicing / indexing into the array rather than creating new objects also has the advantage of speed as you're just creating new views into existing data.

import numpy as np

# test data
nr = 11
nc = 3
a = np.array([np.array(range(nc))+i*10 for i in range(nr)])
print a

# slice to get column
col = a[:,1]
print col

# comprehension to step through column to get means
numpermean = 2
means = [np.mean(col[i:(min(len(col), i+numpermean))]) \
         for i in range(0,len(col),numpermean)]

print means

it prints

[[  0   1   2]
 [ 10  11  12]
 [ 20  21  22]
 [ 30  31  32]
 [ 40  41  42]
 [ 50  51  52]
 [ 60  61  62]
 [ 70  71  72]
 [ 80  81  82]
 [ 90  91  92]
 [100 101 102]]
[  1  11  21  31  41  51  61  71  81  91 101]
[6.0, 26.0, 46.0, 66.0, 86.0, 101.0]

Upvotes: 0

Trond Kristiansen

Reputation: 2456

Simple solution would be:

import numpy as np
data = np.loadtxt('244UTZ10htz.txt', delimiter = '\t', skiprows = 2)
mydata=[]; counter=0
for i,d in enumerate(data):
   mydata.append((d[3]))

    # Find the average of the previous 600 lines
   if counter == 600:
      minavg = np.mean(np.asarray(mydata))

      # reset the counter and start counting from 0
      counter=0; mydata=[]
   counter+=1

Upvotes: 0

Calculating an average for every X number of lines

Answers (4)

Related Questions