Reputation: 339
I have a sample dataset A
that looks like this:
1:CH,AG,ME,GS;AP,CH;HE,AC;AC,AG
2:CA;HE,AT;AT,AC;AT,OG
3:NE,AG,AC;CS,OD
The expected result should be:
['CH','AG','ME','GS','AP','CH','HE','AC','AC','AG','CA','HE','AT','AT','AC','AT','OG','NE','AG','AC','CS','OD']
I am not sure how to write the code in Python to a list.
Upvotes: 0
Views: 974
Reputation: 1529
Try this if Python 2.7
a = "CH,AG,ME,GS;AP,CH;HE,AC;AC,AG"
b = "CA;HE,AT;AT,AC;AT,OG"
c = "NE,AG,AC;CS,OD"
d = a+','+b+','+c
d = d.replace(';',',')
print d.split(',') #output as expected
Upvotes: 0
Reputation: 473853
One option would be to locate all 2 consecutive upper-case letter cases with a regular expression:
In [1]: import re
In [2]: data = """
...: 1:CH,AG,ME,GS;AP,CH;HE,AC;AC,AG
...: 2:CA;HE,AT;AT,AC;AT,OG
...: 3:NE,AG,AC;CS,OD"""
In [3]: re.findall(r"[A-Z]{2}", data, re.MULTILINE)
Out[3]:
['CH',
'AG',
'ME',
'GS',
'AP',
'CH',
'HE',
'AC',
'AC',
'AG',
'CA',
'HE',
'AT',
'AT',
'AC',
'AT',
'OG',
'NE',
'AG',
'AC',
'CS',
'OD']
Upvotes: 4