Reputation: 9920
I have a string 'ABCAPITAL23JAN140CE'. This is the symbol for an option traded on stock exchange. ABCAPITAL part of the string is the company name. 23 is year 2023. JAN is for month. 140 is the strike price and CE is the type of the option.
All these components can vary for different options.
I need a function such that pieces_of_string = splitstring('ABCAPITAL23JAN140CE')
where pieces_of_string = ['ABCAPITAL', 23, 'JAN', 140, 'CE'] is returned
how do I do that?
Upvotes: 2
Views: 89
Reputation: 163362
You might use re.findall with [A-Z]+|\d+
See the matches here on regex101
import re
print(re.findall(r"[A-Z]+|\d+", "ABCAPITAL23JAN140CE"))
# Or converting to int
print([int(v) if v.isdigit() else v for v in re.findall(r"[A-Z]+|\d+", "ABCAPITAL23JAN140CE")])
Output
['ABCAPITAL', '23', 'JAN', '140', 'CE']
['ABCAPITAL', 23, 'JAN', 140, 'CE']
Another option with 4 capture groups matching the digits and the shorted part for the month like JAN
FEB
etc...
^(\S*?)(\d+)(?:JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)(\d+)(\S+)$
See the capture group matches on regex101
import re
m = re.match(r"(\S*?)(\d+)(?:JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)(\d+)(\S+)$", "ABCAPITAL23JAN140CE")
if m:
print(list(m.groups()))
Output
['ABCAPITAL', '23', '140', 'CE']
Upvotes: 9
Reputation: 677
This method test if two adjacent characters are of the same type or not, if yes then concatenate the letters else split.
st = 'ABCAPITAL23JAN140CE'
l = []
s = ""
for i in range(0,len(st)-1):
if (st[i].isnumeric() == st[i+1].isnumeric()) or (st[i].isalpha() == st[i+1].isalpha()):
s = s + st[i]
else:
s = s + st[i]
if s.isnumeric():
l.append(int(s))
else:
l.append(s)
s = ""
Output:
['ABCAPITAL', 23, 'JAN', 140]
Upvotes: 1
Reputation: 1
import re print(re.findall(r"[A-Z]+|\d+", "ABCAPITAL23JAN140CE"))
Upvotes: -1
Reputation: 664
def splitstring(s):
l=[s[0]]
for h in s[1:]:
H=h + l[-1][0]
if H.isdigit() or H.isalpha():
l[-1]+=h
else:
l.append(h)
return l
print(splitstring('ABCAPITAL23JAN140CE'))
Upvotes: 1