KawaiKx
KawaiKx

Reputation: 9920

how do I breakdown a string in multiple components

I have a string 'ABCAPITAL23JAN140CE'. This is the symbol for an option traded on stock exchange. ABCAPITAL part of the string is the company name. 23 is year 2023. JAN is for month. 140 is the strike price and CE is the type of the option.

All these components can vary for different options.

I need a function such that pieces_of_string = splitstring('ABCAPITAL23JAN140CE')

where pieces_of_string = ['ABCAPITAL', 23, 'JAN', 140, 'CE'] is returned

how do I do that?

Upvotes: 2

Views: 89

Answers (4)

The fourth bird
The fourth bird

Reputation: 163362

You might use re.findall with [A-Z]+|\d+

See the matches here on regex101

import re
print(re.findall(r"[A-Z]+|\d+", "ABCAPITAL23JAN140CE"))

# Or converting to int
print([int(v) if v.isdigit() else v for v in re.findall(r"[A-Z]+|\d+", "ABCAPITAL23JAN140CE")])

Output

['ABCAPITAL', '23', 'JAN', '140', 'CE']
['ABCAPITAL', 23, 'JAN', 140, 'CE']

Another option with 4 capture groups matching the digits and the shorted part for the month like JAN FEB etc...

^(\S*?)(\d+)(?:JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)(\d+)(\S+)$

See the capture group matches on regex101

import re
m = re.match(r"(\S*?)(\d+)(?:JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)(\d+)(\S+)$", "ABCAPITAL23JAN140CE")
if m:
    print(list(m.groups()))

Output

['ABCAPITAL', '23', '140', 'CE']

Upvotes: 9

alphaBetaGamma
alphaBetaGamma

Reputation: 677

This method test if two adjacent characters are of the same type or not, if yes then concatenate the letters else split.

    st = 'ABCAPITAL23JAN140CE'
    l = []
    s = ""
    for i in range(0,len(st)-1):
        if (st[i].isnumeric() == st[i+1].isnumeric()) or (st[i].isalpha() == st[i+1].isalpha()):
            s = s + st[i]
        else:
            s = s + st[i]
            if s.isnumeric():
                l.append(int(s))
            else:
                l.append(s)
            s = ""

Output:

['ABCAPITAL', 23, 'JAN', 140]

Upvotes: 1

Ajay kodthiwada
Ajay kodthiwada

Reputation: 1

import re print(re.findall(r"[A-Z]+|\d+", "ABCAPITAL23JAN140CE"))

Upvotes: -1

islam abdelmoumen
islam abdelmoumen

Reputation: 664

def splitstring(s):
    l=[s[0]]
    for h in s[1:]:
       H=h + l[-1][0]
    
        if H.isdigit() or H.isalpha():
            l[-1]+=h
        else:
            l.append(h)
    return l
        
        

print(splitstring('ABCAPITAL23JAN140CE'))

Upvotes: 1

Related Questions