Reputation: 105
I got the information of a txt file and store it as lines
print(lines)
['>chr12_9180206_+:chr12_118582391_+:a1;2 total_counts: 115 Seed: 4 K: 20 length: 79\n', 'TTGGTTTCGTGGTTTTGCAAAGTATTGGCCTCCACCGCTATGTCTGGCTGGTTTACGAGC\n', 'AGGACAGGCCGCTAAAGTG\n', '>chr12_9180206_+:chr12_118582391_+:a2;2 total_counts: 135 Seed: 4 K: 20 length: 80\n', 'CTAACCCCCTACTTCCCAGACAGCTGCTCGTACAGTTTGGGCACATAGTCATCCCACTCG\n', 'GCCTGGTAACACGTGCCAGC\n']
If you execute the code
for i in lines:
print(i)
You get:
>chr12_9180206_+:chr12_118582391_+:a1;2 total_counts: 115 Seed: 4 K: 20 length: 79
TTGGTTTCGTGGTTTTGCAAAGTATTGGCCTCCACCGCTATGTCTGGCTGGTTTACGAGC
AGGACAGGCCGCTAAAGTG
>chr12_9180206_+:chr12_118582391_+:a2;2 total_counts: 135 Seed: 4 K: 20 length: 80
CTAACCCCCTACTTCCCAGACAGCTGCTCGTACAGTTTGGGCACATAGTCATCCCACTCG
GCCTGGTAACACGTGCCAGC
I want to store the sequences that are in caps TTGGTTTCGTGGTTT...
as independent elements in an object so you can operate with them, so you would be able to do something like:
seq[1]
>>> TTGGTTTCGTGGTTTTGCAAAGTATTGGCCTCCACCGCTATGTCTGGCTGGTTTACGAGCAGGACAGGCCGCTAAAGTG
Upvotes: 0
Views: 72
Reputation: 103834
I would use a regex:
import re
seq={}
pattern=r'^(>.*$)\n([ACGTU\n]*?)(?=^>|\Z)'
for i,m in enumerate(re.finditer(pattern, ''.join(lines), flags=re.M)):
seq[i]=m.group(2).replace('\n','')
Then each FASTA seq is mapped to an integer:
>>> seq
{0: 'TTGGTTTCGTGGTTTTGCAAAGTATTGGCCTCCACCGCTATGTCTGGCTGGTTTACGAGCAGGACAGGCCGCTAAAGTG', 1: 'CTAACCCCCTACTTCCCAGACAGCTGCTCGTACAGTTTGGGCACATAGTCATCCCACTCGGCCTGGTAACACGTGCCAGC'}
Upvotes: 1
Reputation: 2439
To check wheter a string is caps I woult use mySting == mySting.upper()
.
To get all caps elements you could use a list comprehension like so:
result = [s for s in lines if lines == lines.upper()]
This would still allow special characters in your string.
If you only want uppercase leters then use lines.isalpha().
result = [s for s in lines if lines == lines.upper() and lines.isalpha()]
Upvotes: 1
Reputation: 8564
You can do this:
lines = list(map(str.strip, (filter(str.isupper, lines))))
Upvotes: 2
Reputation: 26211
gattaca = [x.strip() for x in lines if x.isupper()]
>>> gattaca
['TTGGTTTCGTGGTTTTGCAAAGTATTGGCCTCCACCGCTATGTCTGGCTGGTTTACGAGC',
'AGGACAGGCCGCTAAAGTG',
'CTAACCCCCTACTTCCCAGACAGCTGCTCGTACAGTTTGGGCACATAGTCATCCCACTCG',
'GCCTGGTAACACGTGCCAGC']
Upvotes: 2
Reputation: 935
So if you use isupper()
you can check if your string in the list is upper case. If True
, it means it is.
for i in lines:
if i.isupper():
## store the string
Upvotes: 1