Reputation: 83
I have an extremely large file of tab delimited values of 10000+ values. I am trying to find the averages of each row in the data and append these new values to a new file. Howvever, values that weren't found are inputted in the large file as -1. Using the -1 values when calculating my averages will mess up my data. How can i exclude these values? The large file structure looks like this:
"HsaEX0029886" 100 -1 -1 100 100 100 100 100 100 -1 100 -1 100
"HsaEX0029895" 100 100 91.49 100 100 100 100 100 97.87 95.29 100 100 93.33
"HsaEX0029923" 0 0 0 -1 0 0 0 0 0 9.09 0 5.26 0
In my code Im taking the last 3 elements and finding the average of just the 3 values. If the last 3 elements in the row are 85 , 12, and -1, I need to return the average of 85 and 12. Here's my entire code:
with open("PSI_Datatxt.txt", 'rt') as data:
next(data)
lis = [line.strip("\n").split("\t") for line in data] # create a list of lists(each row)
for row in lis:
x = float(row[11])
y = float(row[12])
z = float(row[13])
avrg = ((x + y + z) / 3)
with open("DataEditted","a+") as newdata:
if avrg == -1:
continue #skipping lines where all 3 values are -1
else:
newdata.write(str(avrg) + ' ' + '\n')
Thanks. Comment if any clarification is needed.
Upvotes: 0
Views: 76
Reputation: 6468
data = [float(x) for x in row[1:] if float(x) > -1]
if data:
avg = sum(data)/len(data)
else:
avg = 0 # or throw an exception; you had a row of all -1's
The first line is a fairly standard Pythonism... given an array (in this case row), you can iterate through the list and filter out stuff by using the for x in array if condition bit.
If you wanted to only look at the last three values, you have two options depending on what you mean by last three:
data = [float(x) for x in row[-3:] if float(x) > -1]
will look at the last 3 and given you 0 to 3 values back depending on if they're -1.
data = [float(x) for x in row[1:] if float(x) > -1][:-3]
will give you up to 3 of the last "good" values (if you have all or almost all -1 for a given row, it will be less than 3)
Upvotes: 1
Reputation: 114025
This should do it
import csv
def average(L):
L = [i for i in map(float, L) if i != -1]
if not L: return None
return sum(L)/len(L)
with open('path/to/input/file') as infile, open('path/to/output/file', 'w') as fout:
outfile = csv.writer(fout, delimiter='\t')
for name, *vals in csv.reader(infile, delimiter='\t'):
outfile.writerow((name, average(vals))
Upvotes: 1
Reputation: 1283
Here is it in the same format as your original question. It offers you to write an error message if the row is all zeros, or you can ignore it instead and write nothing
with open("PSI_Datatxt.txt", 'r') as data:
for row in data:
vals = [float(val) for val in row[1:] if float(val) != -1]
with open("DataEditted","a+") as newdata:
try:
newdata.write(str(sum(vals)/len(vals)) + ' ' + '\n')
except ZeroDivisionError:
newdata.write("My Error Message Here\n")
Upvotes: 1