Reputation: 101
I was given a text file that gives the coding sequences for various proteins within a certain bacteria. The information comes in the form of a short description as well as the various amino acid coding sequences represented by capital letters. I have been asked to give a count for the various single letter amino acid codes in the form:
A: 1567
C: 8776
D: 6643
E: 3345
etc..
What I have so far:
I know it involves using Dicts and forloops, so I have written:
#!/usr/bin/python
ecoli = open("/file_pathway.txt").read()
counts = dict()
for line in ecoli:
words = line.split()
for word in words:
if word not in counts:
counts[word] = 1
else:
counts[word] += 1
for key in counts:
print key, counts[key]
I am just not how to edit the if statement to only include those particular uppercase letters I am searching for (i.e. A,C,D,E,L...)
Upvotes: 0
Views: 385
Reputation: 103694
You could use a Counter
from collections import Counter
lets=Counter()
with open(ur_file, 'r') as f:
for line in f:
for c in line.strip():
lets[c]+=1
Upvotes: 0
Reputation: 2804
In [1]: !cat test.dat AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDDDDDDDEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
In [2]: inf = open('test.dat','r') #Create the python file object inf
In [3]: s = inf.readline() #read the entire file data into the string variable s
In [4]: [s.count(i) for i in 'ACDE'] #apply list comprehension to get the letter count
Out[4]: [156, 29, 20, 37]
In [5]: inf.close()
In [6]:
I am assuming that your amino acid sequence is written in the file data.dat as a string (no quotes) and you have nothing in the file except the amino acid sequence string. Result: the 'A' count is 156, the 'C' count is 29, etc. Note: the fact that test.dat shows a sorted order for the letters is purely coincidental and irrelevant. The sequence could have bem 'AEDC...' and the generated result would have been the same.
Upvotes: 0
Reputation: 1953
I like to omit the additional test of each word for being in the dict keys, by giving the default value 0
at lookup:
ecoli = open("/file_pathway.txt").read()
counts = dict()
for line in ecoli:
for word in [w for w in line.split() if w in 'ACDEL']:
counts[word] = counts.get(word,0) + 1
Upvotes: 0
Reputation: 11691
A couple of things I suggest here. One you can use collections
to make a dictionary that you can just start adding to
from collections import defaultdict
counts = defaultdict(int)
Then you can just use
counts[word] += 1 #don't need to check if word already exists
If you know what words you are looking for keep them in a list
search_words = ['A', 'C' ...]
Then you can check if the word you care about is in there
if word in search_words:
counts[word] += 1
Upvotes: 0
Reputation: 76184
Add another if
so you only increment counts
for accepted letters.
for word in words:
if word in ["A", "C", "D", "E", "L"]:
if word not in counts:
counts[word] = 1
else:
counts[word] += 1
Upvotes: 1