Ryan Winstead
Ryan Winstead

Reputation: 83

Calculate while excluding -1's

I have an extremely large file of tab delimited values of 10000+ values. I am trying to find the averages of each row in the data and append these new values to a new file. Howvever, values that weren't found are inputted in the large file as -1. Using the -1 values when calculating my averages will mess up my data. How can i exclude these values? The large file structure looks like this:

"HsaEX0029886"  100 -1  -1  100 100 100 100 100 100 -1  100 -1  100
"HsaEX0029895"  100 100 91.49   100 100 100 100 100 97.87   95.29   100 100 93.33
"HsaEX0029923"  0   0   0   -1  0   0   0   0   0   9.09    0   5.26    0

In my code Im taking the last 3 elements and finding the average of just the 3 values. If the last 3 elements in the row are 85 , 12, and -1, I need to return the average of 85 and 12. Here's my entire code:

with open("PSI_Datatxt.txt", 'rt') as data:
    next(data)
    lis = [line.strip("\n").split("\t") for line in data]        # create a list of lists(each row)
for row in lis:
    x = float(row[11])
    y = float(row[12])
    z = float(row[13])
    avrg = ((x + y + z) / 3)
    with open("DataEditted","a+") as newdata:
        if avrg == -1:
            continue    #skipping lines where all 3 values are -1
        else:
            newdata.write(str(avrg) + ' ' + '\n')

Thanks. Comment if any clarification is needed.

Upvotes: 0

Views: 76

Answers (3)

Foon
Foon

Reputation: 6468

   data = [float(x) for x in row[1:] if float(x) > -1]
   if data:
      avg = sum(data)/len(data)
   else:
      avg = 0 # or throw an exception; you had a row of all -1's

The first line is a fairly standard Pythonism... given an array (in this case row), you can iterate through the list and filter out stuff by using the for x in array if condition bit.

If you wanted to only look at the last three values, you have two options depending on what you mean by last three:

data = [float(x) for x in row[-3:] if float(x) > -1]

will look at the last 3 and given you 0 to 3 values back depending on if they're -1.

data = [float(x) for x in row[1:] if float(x) > -1][:-3]

will give you up to 3 of the last "good" values (if you have all or almost all -1 for a given row, it will be less than 3)

Upvotes: 1

inspectorG4dget
inspectorG4dget

Reputation: 114025

This should do it

import csv


def average(L):
    L = [i for i in map(float, L) if i != -1]
    if not L: return None
    return sum(L)/len(L)


with open('path/to/input/file') as infile, open('path/to/output/file', 'w') as fout:
    outfile = csv.writer(fout, delimiter='\t')
    for name, *vals in csv.reader(infile, delimiter='\t'):
        outfile.writerow((name, average(vals))

Upvotes: 1

jacoblaw
jacoblaw

Reputation: 1283

Here is it in the same format as your original question. It offers you to write an error message if the row is all zeros, or you can ignore it instead and write nothing

with open("PSI_Datatxt.txt", 'r') as data:
    for row in data:
        vals = [float(val) for val in row[1:] if float(val) != -1]
        with open("DataEditted","a+") as newdata:
            try:
                newdata.write(str(sum(vals)/len(vals)) + ' ' + '\n')
            except ZeroDivisionError:
                newdata.write("My Error Message Here\n")

Upvotes: 1

Related Questions