Reputation: 81
I have a file below where I want to convert what is written on every fourth line into a number.
sample.fastq
@HISE
GGATCGCAATGGGTA
+
CC@!$%*&J#':AAA
@HISE
ATCGATCGATCGATA
+
()**D12EFHI@$;;
Each fourth line is a series of characters which each individually equate to a number (stored in a dictionary). I would like to convert each character into it’s corresponding number and then find the average of all those numbers on that line.
I have gotten as far as being able to display each of the characters individually but I’m pretty stunted as to how to replace the characters with their number and then subsequently go on further.
script.py
d = {
'!':0, '"':1, '#':2, '$':3, '%':4, '&':5, '\'':6, '(':7, ')':8,
'*':9, '+':10, ',':11, '-':12, '.':13, '/':14, '0':15,'1':16,
'2':17, '3':18, '4':19, '5':20, '6':21, '7':22, '8':23, '9':24,
':':25, ';':26, '<':27, '=':28, '>':29, '?':30, '@':31, 'A':32, 'B':33,
'C':34, 'D':35, 'E':36, 'F':37, 'G':38, 'H':39, 'I':40, 'J':41 }
with open('sample.fastq') as fin:
for i in fin.readlines()[3::4]:
for j in i:
print j
The output should be as below and stored in a new file.
output.txt
@HISE
GGATCGCAATGGGTA
+
19 #From 34 34 31 0 3 4 9 5 41 2 6 25 32 32 32
@HISE
ATCGATCGATCGATA
+
23 #From 7 8 9 9 35 16 17 36 37 39 40 31 3 26 26
Is what i’m proposing possible?
Upvotes: 0
Views: 58
Reputation: 78610
You can do this with a for loop over the input file lines:
with open('sample.fastq') as fin, open('outfile.fastq', "w") as outf:
for i, line in enumerate(fin):
if i % 4 == 3: # only change every fourth line
# don't forget to do line[:-1] to get rid of newline
qualities = [d[ch] for ch in line[:-1]]
# take the average quality score. Note that as in your example,
# this truncates each to an integer
average = sum(qualities) / len(qualities)
# new version; average with \n at end
line = str(average) + "\n"
# write line (or new version thereof)
outf.write(line)
This produces the output you requested:
@HISE
GGATCGCAATGGGTA
+
19
@HISE
ATCGATCGATCGATA
+
22
Upvotes: 1
Reputation: 13850
Assuming you read from stdin
and write to stdout
:
for i, line in enumerate(stdin, 1):
line = line[:-1] # Remove newline
if i % 4 != 0:
print(line)
continue
nums = [d[c] for c in line]
print(sum(nums) / float(len(nums)))
Upvotes: 0