lenhun
lenhun

Reputation: 81

Read and replace the contents of a line using dictionary

I have a file below where I want to convert what is written on every fourth line into a number.

sample.fastq

@HISE
GGATCGCAATGGGTA
+
CC@!$%*&J#':AAA
@HISE
ATCGATCGATCGATA
+
()**D12EFHI@$;;

Each fourth line is a series of characters which each individually equate to a number (stored in a dictionary). I would like to convert each character into it’s corresponding number and then find the average of all those numbers on that line.

I have gotten as far as being able to display each of the characters individually but I’m pretty stunted as to how to replace the characters with their number and then subsequently go on further.

script.py

d = {
'!':0, '"':1, '#':2, '$':3, '%':4, '&':5, '\'':6, '(':7, ')':8,
'*':9, '+':10, ',':11, '-':12, '.':13, '/':14, '0':15,'1':16,
'2':17, '3':18, '4':19, '5':20, '6':21, '7':22, '8':23, '9':24,
':':25, ';':26, '<':27, '=':28, '>':29, '?':30, '@':31, 'A':32, 'B':33,
'C':34, 'D':35, 'E':36, 'F':37, 'G':38, 'H':39, 'I':40, 'J':41 }


with open('sample.fastq') as fin:
    for i in fin.readlines()[3::4]:
            for j in i:
                    print j

The output should be as below and stored in a new file.

output.txt

@HISE
GGATCGCAATGGGTA
+
19 #From 34 34 31 0 3 4 9 5 41 2 6 25 32 32 32
@HISE
ATCGATCGATCGATA
+
23 #From 7 8 9 9 35 16 17 36 37 39 40 31 3 26 26

Is what i’m proposing possible?

Upvotes: 0

Views: 58

Answers (2)

David Robinson
David Robinson

Reputation: 78610

You can do this with a for loop over the input file lines:

with open('sample.fastq') as fin, open('outfile.fastq', "w") as outf:
    for i, line in enumerate(fin):
        if i % 4 == 3:  # only change every fourth line
            # don't forget to do line[:-1] to get rid of newline
            qualities = [d[ch] for ch in line[:-1]]
            # take the average quality score. Note that as in your example,
            # this truncates each to an integer
            average = sum(qualities) / len(qualities)
            # new version; average with \n at end
            line = str(average) + "\n"

        # write line (or new version thereof)
        outf.write(line)

This produces the output you requested:

@HISE
GGATCGCAATGGGTA
+
19
@HISE
ATCGATCGATCGATA
+
22

Upvotes: 1

Miki Tebeka
Miki Tebeka

Reputation: 13850

Assuming you read from stdin and write to stdout:

for i, line in enumerate(stdin, 1):
    line = line[:-1]  # Remove newline
    if i % 4 != 0:
        print(line)
        continue
    nums = [d[c] for c in line]
    print(sum(nums) / float(len(nums)))

Upvotes: 0

Related Questions